This last year, I've been switching back and forward between three RSS readers; RSS Bandit, Sharpreader and Sauce Reader. During that year, I've noticed that Wordpress feeds rarely, if ever, work in these .NET-based RSS readers. For the longest time, I simply ignored the problem and even dropped the said feeds from my blogroll as there was no value in a broken feed anyway. But, as you would have read in my RSS blog, I've been working on a general XML format validator. The first format I addressed was RSS and this got me thinking that I'd test those Wordpress RSS feeds on my validator, which by the way, is also partially based on .NET. My validator reported the feed to be broken, at the HTTP level. Of course, the FeedValidator was reporting the feeds to be valid! Well, I couldn't take that conflict and decided to find out what was up.
The first thing I did was write a small test console application that tried to load the Wordpress feeds into System.Xml.XmlDocument.
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
doc.Load(uri.ToString());
This returns an System.Net.WebException with the message "The underlying connection was closed: The server committed an HTTP protocol violation." Google this and you start to see where the problem is. It turns out that Wordpress is broken. I wanted to see this for myself, so I wrote another console application to query the TCP/IP way.
System.Net.Sockets.TcpClient client = new System.Net.Sockets.TcpClient();
client.Connect(uri.Host, 80);
System.Net.Sockets.NetworkStream stream = client.GetStream();
string command = "GET " + uri.PathAndQuery + " HTTP/1.1\nHost: " + uri.Host + "\n\n\n";
byte[] by = System.Text.Encoding.UTF8.GetBytes(command);
stream.Write(by, 0, by.Length);
System.Threading.Thread.Sleep(1000);
by = new byte[1024];
System.Text.StringBuilder message = new System.Text.StringBuilder();
do
{
int n = stream.Read(by, 0, by.Length);
message.Append(System.Text.Encoding.UTF8.GetString(by, 0, n));
} while (stream.DataAvailable);
And the returned HTTP header is?
HTTP/1.1 200 OK
Date: Tue, 02 Nov 2004 17:06:57 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Accept-Ranges: bytes X-Powered-By: PHP/4.2.2
Last Modified: Tue, 02 Nov 2004 11:45:13 GMT
ETag: "3120b3f942d975a454c923b36a05e837"
X-Pingback: xxx
Connection: close
Transfer-Encoding: chunked
Content-Type: application/rss+xml
Note the Last Modified header. This header-name has a space in it, which is illegal. You can verify this in the HTTP spec. In section 4.2, the header-name is said to be a token and in section 2.2, the token is said to not contain spaces. The Last Modified header is usually written Last-Modified, w/ the hyphen. To fix this problem in Wordpress, search for the following line in your PHP and add the hyphen
@header('Last-Modified: '.$wp_last_modified);.
Now, I need only email this page to my friends and I can enjoy their blogs again.
Update: As usual, Dare has a fix for RSS Bandit already.
Randy
Correct me if iM wrong, but since these feeds are generated w/ each edit of the blog, would you not have to repair the mailformed feed after every edit? And w/ dynamically generated feeds, I don't think feedforall would work at all. Correct?
Randy
-RS