<rant> JSON is a pain in the ass for organizing and parsing structured data, par...

<rant>

JSON is a pain in the ass for organizing and parsing structured data, particularly arrays of things.

It's so much more verbose and disassociates an object's definition from it's type. Let's say you have parents with children with names.

   <parents>
     <parent name="Bob">
       <child name="Alice" age="12" />
     </parent>
   </parents>

The root object is 'parents', so you know a bunch of 'parent' elements are going to follow. When you read the <parent> object, you also know it's a parent because the tag say so. You don't even need a <children> element because it's fine to have a list of <child> elements directly after <parent>.

Now here's what I think a typical JSON equivalent would be:

  {
     "parents": [
        {
          "name": "Bob",
          "children": [
            {
              "name": "Alice",
              "age": 12
            }
          ]
        }
     ]
  }

Ok, so whitespace aside, it's less verbose, but look at all the info that's missing.

What "type" is the root object? You'd say "parents", but how did you find that out? You have to know _a priori_ that a field called 'parents' would have to be there. Not a big deal on the root object because it's usually special, but how about a single parent?

Look at the 'parents' array. The only thing hinting at the fact that { "name": "Bob" } is a parent is the fact that it is part of an array, that's attached to the 'parents' field of the parent object. You have to do 'upwards' lookups to determine what this object is. The object itself doesn't have that information. Same thing with { "name": "Alice" }. How do you know that's a child object? You don't. You have to do an upwards lookup.

Now you might say "just tag the elements with their type so you can keep track of what these objects are". Let's try that:

  {
    "type": "parents",
    "parents": [
      {
        "type": "parent",
        "name": "Bob",
        "children": [
          {
             "type": "child",
             "name": "Alice",
             "age": 12
          }
        ]
      }
    ] 
  }

Sweet, now we're reaching data representation parity, but if you get an object like this from a third party service, how do you validate it without kicking in the logic to process each element? You'd have to have a 'dry run' version of your logic.

In fact, how do you formally describe the structure of these objects to another service so that the service could guarantee that it is only generating valid objects in the first place? XML Schema was a solid solution for that. JSON Schema had no support anywhere last time I looked at it. Where is it now? It looks like you could use it if you wanted to, but afaict most people generate fly-by-the-seat-of-your-pants JSON objects in code and no 'formal' validation is happening, other than the reply code from the service when the object is actually sent (if you think about it, that's just "testing in prod")

I think version 1 of XSLT, XPath and so on where pretty simple solutions to working with structured data, but people went overboard with trying to shoehorn XML into solving problems best suited for imperative code, so you got XSLT 2.0 (want for loops? no.), XQuery and XPath 2.0 abominations, various weird xlink solutions, imperative code in tags: <script>function foo() { }</script> which introduced a second syntax, and so on.

I understand why the XML world of nonsense had to be stopped, but we threw the baby with the bath water.

Don't get me wrong, I like JSON, but I also feel like we collectively took a step back and opted for 'the javascript of structured data representation'.

Maybe the rise of thick, JavaScript heavy clients had a lot to do with it? XML was never easy to work with in JavaScript, which is a shame, seeing as it has the same roots as HTML. I blame the DOM API - it has always been tedious to use in any language that implemented it.

ActionScript had that nice built-in @ syntax for selecting nodes and first-class support for XML in the language (E4X?). How we killed that first-class language support only to turn around and rediscover it in half-baked form as transpiled JSX is beyond me.

</rant>