The expected DTD markup was not found - RSS
Really Simple Syndication
 
Copyright 2003-4 Randy Charles Morin
RSS
<< Previous Main Next >>
Fri, 19 Nov 2004 00:57:02 GMT
The expected DTD markup was not found

Today, I encountered this really weird problem trying to read a particular RSS feed. In fact, I've stubbled across a few RSS feeds w/ this problem. They seem to be related, but I've yet to figure out the exact problem. In fact, if I copy the RSS files to my server, the problem doesn't replicate. The issue relates to RSS version 0.91 feeds w/ a blank line located either directly above or below the DOCTYPE declaration. But, since I can't replicate the problem w/ the same file on my server, there must be a secondary sympton relating to the HTTP headers (or so I speculate). Here's the .NET code that fails, using Ken MacLeod's RSS feed.

System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load("http://bitsko.slc.ut.us/blog/index.rss");

The issue does not affect many .NET-based RSS readers at all, but it fails predictably. So, I assume that most .NET-based RSS readers don't use this code construct. By the way, the test does not fail using a local copy of the same RSS file. The problem is not new, as I've discovered an old M$FT newsgroup posting w/ the same error message.

Update: In all cases, the problem was unrelated to anything I've mentionned above. It was all red herrings that seemed relevant until I found the root cause. Both files were, in fact, invalid XML. They had extra invalid characters in the wrong places. In Ken's case, he has invalid characters before the XML declaration. In the second case, he had a couple extra invalid non-space characters between the XML declaration and the DOCTYPE declaration. Case closed.

Update: Case unclosed. The author of the second feed has informed me that the couple extra invalid characters were accidently added when he tried to delete his DOCTYPE. I had advised him such. He deleted the extra characters and the DOCTYPE and his feed is fine now.

Update: Case re-opened. A reply from Ken indicated that he thought his feed was correct. So, back to reading bytes of an HTTP response. I then, caught on to the fact that Ken's server was using chunked transfer coding, which explains the extra bytes I was seeing, the chunk length. The question remains "Why doesn't Ken's feed work w/ .NET XmlDocument.Load method?"

Update: In case anyone is wondering, I've tested other feeds w/ chunked encoding and they don't all fail. I've encountered only three other feeds that fail in the same manner, but the list grows.

Reader Comments Subscribe
Ken, It's clear now your XML is perfectly fine. The problem seems to be an obscure bug in .NET. I've transferred your file, bytes in tact, to my own server and the exception doesn't happen. I've also reproduced the problem w/ several other feeds, but can't find any commonality in them.

Randy

If the server makes a difference, I do happen to be using Apache 2.
I've checked the originating servers and it's a mix of Apache and IIS. I haven't found any commonality other than DOCTYPE. I'll compile my results and send them to M$FT.

Randy

dsadasdasdas

  • fdgdfgdfgdfgdfgdfgdfgdfgdfgdfgdfgdfgfgdfgdfggdfgdgdfgdfgdfgdfgdfgfdgdfdfgdfggdf
assssssssssssssssssssssssssssssssssssssssssssssssssssssssss
aa

That's 100% entirely bogus.

Randy

I m trying to RSS FEED and getting this result.Plz help me out.

Thanks Sir

I m trying to  Use RSS FEED and getting this Expected DTD markup was not found result.Plz help me out.

Thanks Sir

ambrosi philistino
How long have you been in this field? You seem to know a lot more than I do I’d love to know your sources. Cheers!fertility drugs for women
(it being in the standards tree Serrapeptase

watches Fake watches use low quality material for cases and glass which get scratched very

Type "339":
Top Articles
  1. Unblock MySpace
  2. MySpace
  3. FaceParty, the British MySpace
  4. del.icio.us and sex.com
  5. Blocking Facebook and MySpace
  1. Review of RSS Readers
  2. MySpace Layouts
  3. RSS Stock Ticker
  4. RSS Gets an Enema
  5. Google Reader rejects del.icio.us