The expected DTD markup was not found

Fri, 19 Nov 2004 00:57:02 GMT

Today, I encountered this really weird problem trying to read a particular RSS feed. In fact, I've stubbled across a few RSS feeds w/ this problem. They seem to be related, but I've yet to figure out the exact problem. In fact, if I copy the RSS files to my server, the problem doesn't replicate. The issue relates to RSS version 0.91 feeds w/ a blank line located either directly above or below the DOCTYPE declaration. But, since I can't replicate the problem w/ the same file on my server, there must be a secondary sympton relating to the HTTP headers (or so I speculate). Here's the .NET code that fails, using Ken MacLeod's RSS feed.

System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load("http://bitsko.slc.ut.us/blog/index.rss");

The issue does not affect many .NET-based RSS readers at all, but it fails predictably. So, I assume that most .NET-based RSS readers don't use this code construct. By the way, the test does not fail using a local copy of the same RSS file. The problem is not new, as I've discovered an old M$FT newsgroup posting w/ the same error message.

Update: In all cases, the problem was unrelated to anything I've mentionned above. It was all red herrings that seemed relevant until I found the root cause. Both files were, in fact, invalid XML. They had extra invalid characters in the wrong places. In Ken's case, he has invalid characters before the XML declaration. In the second case, he had a couple extra invalid non-space characters between the XML declaration and the DOCTYPE declaration. Case closed.

Update: Case unclosed. The author of the second feed has informed me that the couple extra invalid characters were accidently added when he tried to delete his DOCTYPE. I had advised him such. He deleted the extra characters and the DOCTYPE and his feed is fine now.

Update: Case re-opened. A reply from Ken indicated that he thought his feed was correct. So, back to reading bytes of an HTTP response. I then, caught on to the fact that Ken's server was using chunked transfer coding, which explains the extra bytes I was seeing, the chunk length. The question remains "Why doesn't Ken's feed work w/ .NET XmlDocument.Load method?"

Update: In case anyone is wondering, I've tested other feeds w/ chunked encoding and they don't all fail. I've encountered only three other feeds that fail in the same manner, but the list grows.

Reader Comments

Sat, 20 Nov 2004 14:31:36 GMT

User comment

Ken, It's clear now your XML is perfectly fine. The problem seems to be an obscure bug in .NET. I've transferred your file, bytes in tact, to my own server and the exception doesn't happen. I've also reproduced the problem w/ several other feeds, but can't find any commonality in them.

Randy

Sat, 20 Nov 2004 15:40:10 GMT

User comment

If the server makes a difference, I do happen to be using Apache 2.

Sat, 20 Nov 2004 18:20:42 GMT

User comment

I've checked the originating servers and it's a mix of Apache and IIS. I haven't found any commonality other than DOCTYPE. I'll compile my results and send them to M$FT.

Randy

Fri, 05 Aug 2005 16:55:49 GMT

User comment

dsadasdasdas