The RSS Blog

News and commentary from the RSS and OPML community.

1) How should I understand the word 'Hint' in this context? Is the information an instruction that the Aggregator should obey, or is it a suggestion the Aggregator should convey to its user and allow them to decide how they act on it?

Randy: Hints are not rules, they are suggestions. That said, a well behaved aggregator would respect these hints.

2) The skipHours/skipDays tags gives me several concerns:

a) A straw poll of 29 sites that I monitor reveals that none of them implement skipHours or skipDays. Syndic8 says that 1.98% of feeds use skipHours and 0.18% use skipDays. That's a pretty small minority to code for.

Randy: Nobody said building software was easy. Very few Webpages use the HTML <IFRAME> tag, but I'd hate to use a browser that didn't support it.

b) How many RSS publishers properly understand that the hours are in GMT? Most Americans I have met (I'm a Brit) think London is GMT - but that's only for half the year. Alternatively, how many will simply use their local time zone by mistake?

Randy: Most all blogs that publish RSS w/ <skipHours>, use software (Radio) that understands that the hours are in GMT.

c) There is an obvious issue when people are in different time zones. For example, a reader in Singapore has no working hours overlapping with working hours in, say, New York. Simple implementation of the skipHours tag could mean that an Aggregator would never poll some feeds for some people.

Randy: I don't think there's an issue here. Assume it's GMT and if somebody else failed to make that assumption, then it's not your fault. Point them to the RSS spec.

d) Similarly, not polling on, say, Sundays is subject to interpretation. Is it Sunday for the reader or the publisher that the Aggregator should exclude from polling? I guess it has to be subject to the reader's time zone, but does that implement the publisher's expectation?

Randy: No, the timezone does not matter. Use GMT, not the publisher timezone, not the reader timezone. But, I think everybody understands that this won't be perfect and is somewhat confusing.

3) ttl is more widely implemented - Syndic8 says 7.74%. Use of the Syndication module will push this up a bit. However, I find that most sites specify hourly caching or less. My Aggregator's minimum poll interval is one hour, so implementing ttl would increase polling in many cases! That having been said, I plan to implement the higher of ttl and hourly polling as a minimum polling interval - subject to question 1 above.

Randy: Yes, generally feeds have <ttl> values of one hour or less. For example, Yahoo has feeds w/ <ttl>'s of 5 minutes. They are trying to tell the RSS reader that it's OK to pull more than once per hour. I suggest you use the <ttl> where provided, the syndication module value where provided, and everywhere else default to one hour.

4) I'm obviously interested in the Accept-Encoding tag because it looks like everyone wins from correct implementation of that one. However:

Randy: Win-win.

a) I use the .net HttpWebRequest class to read RSS feeds but I can't find any authoritative statement about whether this tag is implemented automatically in the .net framework, or not. I can't see why it wouldn't be (and the referenced w3.org document suggests that, by default, the server can send a compressed response), but we are talking Microsoft here.

Randy: By default, HttpWebRequest will not use transfer encoding. I have heard of people using gzip w/ HttpWebRequest, but I don't know if it's reliable. Note, the server may not send a compressed response unless client specifically allows it.

b) As I understand it, the reader tries Accept-Encoding with, say, "gzip,deflate" if it gets a 406 (Not acceptable) status code back it has to try again without the Accept-Encoding. But doesn't this mean a double hit on servers that don't support compression? OK, it could remember the initial response but suppose the server is upgraded to support compression later on?

Randy: Not at all. If the Accept-Encoding is "gzip, deflate" and the server doesn't support either, then it can respond w/ identity encoding. That's because identity encoding is specifically allowed and the default unless otherwise excluded by adding "identity;q=0" to the Accept-Encoding. Identity encoding is no transform encoding, that is, the way we usually do it.

c) I can't find any tutorial on how to handle compressed responses from a server. If .net doesn't handle them automatically, can anyone point me in the right direction?

Randy: Sorry, I don't know either and I've never seen a working sample.

Reader Comments Subscribe

Thanks for taking the time and trouble to respond to my questions.  I appreciate that.

Probably being over-pedantic but 'Hint' seems a strange word to use when 'Request' would seem an obvious and better word given your response.  I'm thinking that what I'll do is allow end-users to override any hints I implement with an explanation of what that implies in the help text.

I think you might have missed my point on the non-overlapping time zones.  Let's say a channel is published in New York and is maintained during the working day.  Currently - with no daylight saving - say, 1pm to 10pm GMT.  Singapore (with no daylight saving) is always 8 hours ahead of GMT.  So the working day might be 12pm GMT to 9am GMT.  If the feed has skipHours set for: 0 and 10-23, the Singapore user's Aggregator might never poll it for updates.  I guess the answer is to to add some code that polls at least once during skipHours if the Aggregator has not received a response from the feed during the previous polling period.  But I guess you can see why I am concerned that there might be insufficient justification to add complexity that might disrupt other parts of the program.  Right now I'm still thinking that implementing skipDays and skipHours might cause more problems than it solves given it is rarely used.

You seem to be OK with me implementing ttl even though it will increase polling frequency in many cases.  As it happens I just got a request to allow people to specify intervals in minutes (currently I allow hours or days, minimum 1 hour) so I might add that as a feature.  I'm still thinking I will implement a one hour interval as a minimum by default because smaller intervals have the potential to generate frequent, annoying interruptions for the reader.

On the compression issue, I guess I'll just have to find time to experiment.  I'm reassured by your point on identity encoding, though.

Thanks again for your response,

Andy Henderson, Constructive IT Advice.

Type "339":