Thu, 28 Jul 2005 17:18:03 GMT

A very interesting thread is circulating the Blogosphere about the disadvantages of XML on the Web. I'll let you read the thoughts to day. I won't quote these articles otherwise, because my points isn't about any one point they make, but rather about a general misunderstanding people have about XML.

Let me begin with a history lesson which I'm certain Dare, Tantek and Anne don't need, but for the rest of us.

A long time ago, Charles Goldfarb invented a markup language called SGML. SGML described how you could encode a hierarchical data structure inside of angled brackets. A little bit later, Tim Berners-Lee implemented HTML using SGML. That is, he created a hypertext-based hierarchical data structure based on the principles of SGML. The next part of story we all know very well. The Web and HTML got very popular.

At this point, people wondered if other SGML applications could also exist on the Web. The problem was that SGML parsing required knowledge of the application format (the DTD). This required a very smart parser. IE and Firefox are very smart parsers. You couldn't imagine how many mistakes the average Web developer makes and yet IE and Firefox are still capable of rendering something intelligent to the end-user. If we wanted other applications to exist on the Web, then SGML could not be the answer. What we needed was a subset of SGML that could be easily parsed. Enter XML.

Meanwhile some really intelligent people realized that HTML had another flaw. It mixed content and presentation. We could separate the content from the presentation (stylesheets). Now here's where I get confused. Along came XSLT and CSS. Both were god-awful attempts to add stylesheets to XML and HTML, but simple enough that they were widely adopted. In another thread, people started wondering how they could port HTML from SGML to XML and there ya go, we have XHTML. Now we're getting pretty close to lightweight parsing Web. But how does this all fit together? It doesn't.

You can format generic XML by tossing it thru a stylesheet and you can format XHTML by applying a bit of CSS, but there's no real convergence. So, you have two camps; one argument for styled generic XML and another for styled XHTML. What's new? Both have upsides and both have downsides. It's like Atom vs. RSS. XQuery vs. XPath.

On the other hand, XML is also great as a wire format for transferring data (RSS) between applications, but that's entirely another story and has little (but some) TODO with whether we should style generic XML or style XHTML. Should you apply a stylesheet to RSS to make your feed presentable? I guess you could and it works, so people are doing it. This is kinda how my blog software works. You see, each page has an equivalent RSS (XML) view. On the server side, I run the RSS thru an XSLT and apply a CSS.

Last, can we at least agree that HTML and SGML should be burried ASAP?

