The vast majority of APIs today are using the JavaScript Object Notation (JSON) to represent the structured data that they are exchanging. While JSON has been popular for a number of years now, there still are APIs out there that use the Extensible Markup Language (XML) instead, and in some communities, this still is a popular data format.
Why JSON won over XML
It’s interesting to think about the fact that the first wave of APIs, called Web Services back then, was based on XML (SOAP being the most notable representative), and how quickly this changed for SOAP and XML being discarded as being too heavy and too clunky.
Further, REST and JSON instead becoming the mainstays of API technology. What was it that caused this shift in preferences?
This is an opinionated perspective explaining why JSON won over XML, and it takes three steps to understand the bigger picture:
History of XML
XML was first standardized in 1998. It was designed to be the language for representing structured documents on the Web. The idea was that HTML as a language was too limited and that allowing everybody to define their own document formats would transfer Web publishing into something where Web documents could represent much more of a domain’s concepts.
Regardless of how well this idea worked out (it didn’t), XML was designed to represent structured documents. The idea was that everybody would publish their structured documents on the Web, and XML (and associated technologies like XSLT) would then render these into formatted presentations in the browser.
None of this ever happened, but that’s the reason XML exists: As a way to represent structured documents on the Web.
What is XML for?
XML was invented to represent structured documents on the Web. Like other structured document formats (HTML for regular Web documents, and SGML for a more flexible way of defining your own document vocabulary), XML takes the approach of representing documents as mixed content.
Yet, a document is a mix of regular text (the human-readable characters we expect in a document) and document structure (the structure necessary to represent paragraphs, list, emphasized text, links, and whatever else your document model may allow).
These structured documents form a tree (an ordered tree, to be precise for your computer science folks out there), and that’s what XML is: It is a format for structured documents that are using ordered trees to represent these documents.
It is worth noticing that this idea of ordered trees is unusual for most programming languages. Their idea of structured data mostly revolves around concepts such as objects or similar ways of representing structured data types.
As such, XML was a little complicated to use in most programming languages out there because they were not well-equipped to make working with ordered trees very convenient.
Why is JSON a better fit for APIs?
This is where the JavaScript Object Notation (JSON) enters the picture. JSON is nothing more than the data structure part of the JavaScript programming language. That means by its very definition, it is a perfect fit to represent JavaScript objects.
But because most programming languages have similar models of how to represent structured data, JSON also is a good fit for the internal model of many other programming languages.
This meant that JSON was a much more natural fit for developers to exchange structured data. It did not require the rather inconvenient “data binding” and “data serialization” steps that were notoriously difficult when using XML-based APIs.
Instead, JSON allowed APIs to represent structured data in a way that simply was a better fit for the conceptual universe that most developers live in.
Putting it all together
In the end, the argument is that JSON mostly won over XML because its model is a much better fit for most API scenarios (most APIs revolve around structured data and not structured documents).
There were other factors as well, of course, but from what I personally experienced during the transition phase, it seems that the most relief came from the fact of “not having to deal with trees and DOM” anymore.
You may have had different experiences and interpretations, and I am really curious to hear about them. But first please let me walk you through the whole argument, and why I concluded that I am presenting there.
If you liked this video, why don’t you check out Erik’s YouTube channel for more “Getting APIs to Work” content?