Really Simple Syndication
Copyright 2003-4 Randy Charles Morin
1) How should I understand the word 'Hint' in this context? Is the information an instruction that the Aggregator should obey, or is it a suggestion the Aggregator should convey to its user and allow them to decide how they act on it?
Randy: Hints are not rules, they are suggestions. That said, a well behaved aggregator would respect these hints.
2) The skipHours/skipDays tags gives me several concerns:
a) A straw poll of 29 sites that I monitor reveals that none of them implement skipHours or skipDays. Syndic8 says that 1.98% of feeds use skipHours and 0.18% use skipDays. That's a pretty small minority to code for.
Randy: Nobody said building software was easy. Very few Webpages use the HTML <IFRAME> tag, but I'd hate to use a browser that didn't support it.
b) How many RSS publishers properly understand that the hours are in GMT? Most Americans I have met (I'm a Brit) think London is GMT - but that's only for half the year. Alternatively, how many will simply use their local time zone by mistake?
Randy: Most all blogs that publish RSS w/ <skipHours>, use software (Radio) that understands that the hours are in GMT.
c) There is an obvious issue when people are in different time zones. For example, a reader in Singapore has no working hours overlapping with working hours in, say, New York. Simple implementation of the skipHours tag could mean that an Aggregator would never poll some feeds for some people.
Randy: I don't think there's an issue here. Assume it's GMT and if somebody else failed to make that assumption, then it's not your fault. Point them to the RSS spec.
d) Similarly, not polling on, say, Sundays is subject to interpretation. Is it Sunday for the reader or the publisher that the Aggregator should exclude from polling? I guess it has to be subject to the reader's time zone, but does that implement the publisher's expectation?
Randy: No, the timezone does not matter. Use GMT, not the publisher timezone, not the reader timezone. But, I think everybody understands that this won't be perfect and is somewhat confusing.
3) ttl is more widely implemented - Syndic8 says 7.74%. Use of the Syndication module will push this up a bit. However, I find that most sites specify hourly caching or less. My Aggregator's minimum poll interval is one hour, so implementing ttl would increase polling in many cases! That having been said, I plan to implement the higher of ttl and hourly polling as a minimum polling interval - subject to question 1 above.
Randy: Yes, generally feeds have <ttl> values of one hour or less. For example, Yahoo has feeds w/ <ttl>'s of 5 minutes. They are trying to tell the RSS reader that it's OK to pull more than once per hour. I suggest you use the <ttl> where provided, the syndication module value where provided, and everywhere else default to one hour.
4) I'm obviously interested in the Accept-Encoding tag because it looks like everyone wins from correct implementation of that one. However:
a) I use the .net HttpWebRequest class to read RSS feeds but I can't find any authoritative statement about whether this tag is implemented automatically in the .net framework, or not. I can't see why it wouldn't be (and the referenced w3.org document suggests that, by default, the server can send a compressed response), but we are talking Microsoft here.
Randy: By default, HttpWebRequest will not use transfer encoding. I have heard of people using gzip w/ HttpWebRequest, but I don't know if it's reliable. Note, the server may not send a compressed response unless client specifically allows it.
b) As I understand it, the reader tries Accept-Encoding with, say, "gzip,deflate" if it gets a 406 (Not acceptable) status code back it has to try again without the Accept-Encoding. But doesn't this mean a double hit on servers that don't support compression? OK, it could remember the initial response but suppose the server is upgraded to support compression later on?
Randy: Not at all. If the Accept-Encoding is "gzip, deflate" and the server doesn't support either, then it can respond w/ identity encoding. That's because identity encoding is specifically allowed and the default unless otherwise excluded by adding "identity;q=0" to the Accept-Encoding. Identity encoding is no transform encoding, that is, the way we usually do it.
c) I can't find any tutorial on how to handle compressed responses from a server. If .net doesn't handle them automatically, can anyone point me in the right direction?
Randy: Sorry, I don't know either and I've never seen a working sample.