Really Simple Syndication
Copyright 2003-4 Randy Charles Morin
Today, I encountered this really weird problem trying to read a particular RSS feed. In fact, I've stubbled across a few RSS feeds w/ this problem. They seem to be related, but I've yet to figure out the exact problem. In fact, if I copy the RSS files to my server, the problem doesn't replicate. The issue relates to RSS version 0.91 feeds w/ a blank line located either directly above or below the DOCTYPE declaration. But, since I can't replicate the problem w/ the same file on my server, there must be a secondary sympton relating to the HTTP headers (or so I speculate). Here's the .NET code that fails, using Ken MacLeod's RSS feed.
System.Xml.XmlDocument doc =new System.Xml.XmlDocument();
The issue does not affect many .NET-based RSS readers at all, but it fails predictably. So, I assume that most .NET-based RSS readers don't use this code construct. By the way, the test does not fail using a local copy of the same RSS file. The problem is not new, as I've discovered an old M$FT newsgroup posting w/ the same error message.
Update: In all cases, the problem was unrelated to anything I've mentionned above. It was all red herrings that seemed relevant until I found the root cause. Both files were, in fact, invalid XML. They had extra invalid characters in the wrong places. In Ken's case, he has invalid characters before the XML declaration. In the second case, he had a couple extra invalid non-space characters between the XML declaration and the DOCTYPE declaration. Case closed.
Update: Case unclosed. The author of the second feed has informed me that the couple extra invalid characters were accidently added when he tried to delete his DOCTYPE. I had advised him such. He deleted the extra characters and the DOCTYPE and his feed is fine now.
Update: Case re-opened. A reply from Ken indicated that he thought his feed was correct. So, back to reading bytes of an HTTP response. I then, caught on to the fact that Ken's server was using chunked transfer coding, which explains the extra bytes I was seeing, the chunk length. The question remains "Why doesn't Ken's feed work w/ .NET XmlDocument.Load method?"
Update: In case anyone is wondering, I've tested other feeds w/ chunked encoding and they don't all fail. I've encountered only three other feeds that fail in the same manner, but the list grows.