|
Randy Charles Morin blogs about RSS, OPML and the XML platform.
|
|
|
|
Blogroll
Copyright 2003-7 Randy Charles Morin
|
Three years ago, I predicted FeedBurner would sell-out to Google. Ever since, I've been neglectful of predicting anything for the new year. Here goes. My predictions for the Web in 2010.
Sounds like a bad year for the Web.
I ran Rmail for two years before selling it to NBC. They rebranded the service as SendMeRSS, but abandoned the service after a year.
Please email me (randy@kbcafe.com), if you have evidence of spam to x-Rmail users. Please include the entire message, including SMTP headers. Thanks in advance.
Yahoo! has announced it is transferring the Media RSS specification to the RSS Advisory Board. Some more awesome work from Rogers Cadenhead.
http://ycorpblog.com/2009/11/17/owf/
http://tech.groups.yahoo.com/group/rss-board/message/295
In mid-August, Amazon changed their product API. They announced the change well in advance, but I didn't allocate a lot of time to fixing my numerous Web services that depended on it. Rather, I simply shut down all the services and even let one website (HelloSanta.org) break. Yesterday and today, I took the time to rewrite the HelloSanta website to include the API changes. It took a few hours and it worked. One line of code was replaced by 10 lines, plus an entirely new class. Quite a change.
In the meanwhile, Amazon suffered thru 3 months of my website being down. I lost a few bucks and Amazon lost a few bucks as well. Not a lot for me. Not a lot for Amazon, unless you consider that they had ten of thousands (maybe millions) of broken Web services, many of which generated substantially more sales than mine. What they did was chop off their longtail. The Long Tail is a theory put forth by Chris Anderson that the Internet generates an enormous amount of wealth/traffic/profit from massive amounts of low value services. Something to think about before you change your own established Web API.
Technorati just imploded. Many of their search feeds were redirected to the feed All articles on Technorati causing Reblinks to send out hundreds, possibly thousands of emails. Unless Technorati is fixed, I might have to unsubscribe everybody from any Technorati feeds.
I thought I'd prepare a blog entry describing the differences between PubSubHubbub and rssCloud. I'm doing this mostly for myself, as I'm currently implementing a desktop client based entirely on PubSubHubbub and rssCloud. My goal is to solve the NAT traversal problem using long polling thru a notification gateway. It's not the optimal solution, but maybe we can add a notification gateway to PubSubHubbub and/or rssCloud and make them work behind NATs and firewalls.
Both protocols are classic publish and subscribe. Publishers have a relationship with a hub. Clients subscribe to a hub. Publishers send updates to the hub. The hub pushes notifications out to the Clients. Nobody invented anything here, we've being doing this in computer science for a long time. Let's examine each of the four major interactions within the system; subscribing, unsubscribing, pinging and notification.
Subscribing is when a client tells the hub that it wishes to receive notification from one of its publishers. In this case, this might be a Web-based RSS aggregator (Google Reader) or a desktop RSS client (Feed Demon). In both cases, the client has polled an RSS or Atom feed and discovers that the feed has PubSubHubbub or rssCloud notification support. The client uses the information within the feed to send a simple HTTP request to the hub with varying parameters to setup the subscription. Very similar.
There are a couple small differences. PubSubHubbub only supports their REST API, while rssCloud supports all of XML-RPC, SOAP and REST. This would make rssCloud slightly more difficult to implement, as you have to account for three possible transports. Another difference is that rssCloud does not specify the target IP address of the client. Rather, it is assumed that the host of the request is also the notification target. You'll see later on that this makes implementing a notification gateway more difficult. The rssCloud protocol may include a parameter to allow passisng of the notification target's IP address in the near future.
There is one big difference. Because PubSubHubbub allows the subscriber to specification the notification ended, there is a greater possibility of malicious hackers or code subscribing a notification end-point against it's will. PubSubHubbub follows up all subscribing and unsubscribing requests by verifying with the client that their intent was true. The adds additional, but required complexity to PubSubHubbub. The rssCloud protocol may include subscriber verification in the near future.
Unsubscribing is when a client stops receiving notifications. With PubSubHubbub, unsubscribing involves the client sending a unsubscribe request to the server. With rssCloud, there is no request. Rather all subscriptions are automatically dropped after 24 hours. Don't think any of the two techniques are better than the other, there are advantages and disadvantages to both.
First off, I don't know anybody's software that is smart enough to unsubscribe when you close your laptop. Second, what happens when my laptop is closed and the hub is trying to send notifications? Are the notifications queued? How many failures before you unsubscribe the misbehaved (not really) client. Neither protocols is air tight and neither addresses numerous scenarios that arise frequently in homes and offices all across this Internet-enabled planet.
The ping component of both services are very similar. Both support a REST ping.
In addition to a basic REST ping, rssCloud allows the publisher to ping the cloud with all of REST, XML-RPC and SOAP. There doesn't appear to be a discover mechanism that tells publishers which of the protocols are accepted by the cloud, but this shouldn't be a much of a problem, since discover can occur via trial and error and the REST ping is likely to be supported by all rssClouds.
rssCloud does provide an additional lightly specificied interface that pushes the RSS feed to the cloud, allowing the cloud to host the RSS feed on behalf of the publisher. I highly doubt this would be widely used my many, unless the cloud implements more feed hosting services.
Notification is likely the biggest difference between rssCloud and PubSubHubbub.
rssCloud again allows for REST, XML-RPC and SOAP packages. This greatly increases the complexity of the cloud. The rssCloud notification is effectively a reverse ping, where the cloud ping the subscriber to tell it to fetch the feed and find out what changed.
PubSubHubbub implements a much more complex notificaton. It's not a simple ping, but rather an POSTed XML package contain the feed that has been updated, but with only the entries of the feed that are new or that have been updated. This creates an enormous amount of state problems within the hub. What happens when the previous ping failed? Do you send multiple updates in the next ping? This could mean sending different notifications to the subscribers or subscribers missing new and updated entries. On the other hand, this will avoid flooding the publisher with simulaneous feed fetches from all the subscribers who've been notified of the change. Neither approach is optimal, neither is horrible.
Both protocols have a major failing, in that they rely on servers connecting to subscribing clients. If a client exists behind an unfriendly NAT or firewall, then the protocols simply fail. You can implement UPnP and other protocols and break your way thru some NATs and firewalls, but the problem will still exist on a large piece of the Internet pie.
Long polling is the solution to the NAT and firewall problem. Long polling is not the optimal solution to the notification problem because long polling involves holding open connections from the client to the server. This means 10,000 clients will hold 10,000 connections open. Yikes! The real solution is to detect the capabilities of the client and use direct notification where possible, UPnP where possible and long polling where everything else fails. It might be difficult to convince a developer of rssCloud subscribing software to implement UPnP when he can simply resort to long polling from the beginning.
One great advantage of PubSubHubbub is that it is tightly specified with lots of examples and code. rssCloud on the other hand is very loosely specified with pieces of code and text found in various interlinked documents across many websites.
Please submit corrections in my comments or via email (randy@kbcafe.com) where this document is incorrect. Thanks!
If you are looking for new web hosting, then Webhostinggeeks.com is the best place to start. Webhostinggeeks is a website dedicated to informing webmasters about the various web hosting services, allowing them to make the best decisions in chosing a Web hosting service. On their homepage, you get the big picture with their Top 10 Web Hosting, Best Web Hosts of 2009. They pic the top ten hosting services and provide detailed reviews of each, including pricing bandwidth, storage, bonuses, technical support, average user rankings and much much more.
Beyond simply the top Web hosts, they also provide reviews of Web hosting services in specific categories; multiple domain hosting, green Web hosting, vps hosting, dedicated server hosting, free domain names, free Yahoo! marketing and free Google Adwords. So, if you have these specific needs, then you can drill down directly to the hosting services that meet your needs.
The web host reviews on Webhostinggeeks.com also provides a section where users can submit their own rankings and reviews, and read those left by others. Some of the web hosts have more than a 100 reviews with a lot of great information. Many of the reviews are from review websites, but others are from geeks like you and me.
Webhostinggeeks also provides awards in various other categories; best budget hosting, best blog hosting, best forum hosting, best UNIX hosting, best Windows hosting, best PHP hosting, best email hosting, best ecommerce hosting, best multiple domain hosting, best vps hosting, best reseller hosting and best dedicated hosting.
Don't forget to check out their blog and subscribe to their blog feed. You can also subscribe to their blog using Reblinks to get emails everytime the blog is updated. Their blog is updated several days per week with new articles dedicated to helping webmasters make the best decisions when choosing web hosts. The blog entries include industry news, trends, products and discussions.
This is a paid review.
RSS is great. That said, it isn't real-time. With the advent of Twitter, people are now beginning to wonder why RSS needs to poll every hour. Why can't we have the immediacy of Twitter in RSS. Time for a history lesson.
Why is RSS here? It's a pretty simple and stupid protocol based on polling to simulate push. Why didn't we just create a real push protocol? Why? You can't. Push doesn't work very well on the Internet. Some people will point you to email and instant messaging as push technologies that actually work on the Internet. Unfortunately, they are wrong.
Are emails pushed? Yes. They are pushed around the Internet between SMTP gateways and eventually land themselves in an inbox. An inbox. Not your desktop. An inbox sitting on a server somewhere. Then we opened our POP3 client and it pulled the emails back down to our email client. Or maybe we open a browser and pull the data down to our Web browser. Either way, email isn't only push, it's push with a bit of pull on the end.
Are instant messages pushed? Sometimes. But even your Instant Messaging clients will fall back to clients connecting to servers using long polling, when push oriented connects from the server to the client fail.
You can't push. You can only poll. You can only pull. That's why push-based publish and subscribe technology failed in the 90s. That's why polling oriented technologies like RSS ruled the world for the last decade.
For nearly a decade, we were happy with the RSS solution giving us updates as late as an hour after publication. Life was great. Engineers weren't happy. Engineers hate polling. They want push. They want real-time RSS. They invented rssCloud. It died. Yes, rssCloud was invented a long time ago. They invented ping servers. Ping servers died. Then someone created Twitter, a centralized publish and subscribe service for micro-content. Geeks were in awe of the real-time immediacy of Twitter. Geeks were not happy about the centralized nature of Twitter. Geeks want the immediacy of Twitter and the decentralization of RSS.
Some of the engineers slash geeks that wanted real-time RSS worked at Google and they got together with the team at Google Reader and wrote PubSubHubbub. I think it was released in July of this year, but it may have been earlier. When I first saw PubSubHubbub in early July, I wondered how it was any different than previously failed pushing technologies (rssCloud specifically). If you read the spec, then PubSubHubbub is basically a copy of the rssCloud spec with some additional features meant to optimize, but that made it more complex and more difficult to implement than rssCloud.
I don't think Dave Winer was too impressed with PubSubHubbub, a knock off of his pre-existing and failed rssCloud technology. As such, Dave restarted the rssCloud movement and here we are today. Two technologies that are no different than everything that failed before it.
Who will win? Both? Neither? Someone? Somebody else?
If the notifications don't come streaming in, then your rssCloud service failed to automatically detect that the client was offline and queue the notifications. Or maybe the rssCloud didn't automatically detect that the client changed IP addresses. Or possibly there's a NAT at Starbucks that has to be configured to allow connections to the client from the server. Or maybe there's a firewall at Starbucks and your rssCloud service didn't automatically call starbucks and ask them to disable the firewall. Or possibly your software did call Starbucks and they didn't think the request was reasonable.
If you can get this scenario to work, then tell your son Jesus Hi for me.
Now call up your corporate IT and ask them to open an incoming port for you, because you need rssCloud to work. If they deny this, then don't worry, it's not your software that's broken, it's the rssCoud protocol that doesn't work behind corporate firewalls.
If she does know how to configure the wireless router, then you fail anyways because the intent of the test was to see if a mundane user behind a NAT could use an rssCloud client and your wife is a geek, not a mundane.
This will make rssCloud easy to scale, since most users will not be able to use and only the few geeks capable of pushing load on the servers. The designers of rssCloud were thinking scalability when it was written.
Typed on cellphone, please excuse typos.
Do you remember 3 years ago? We were all pinging several dozens ping servers everytime we updated our blogs. We pinged Technorati, PubSub, PinGoat, Ping-O-Matic, etc. I wrote extensively about the Blogosphere Ping infrastructure at the time. Read more at the next link.
http://www.therssweblog.com/default.aspx?search=blogosphere+ping
I wrote about how it didn't work. Companies that relied on the ping, like Technorati and PubSub have stagnated and disappeared. That's because ping infrastructure requires big walls of servers and that costs a lot of money. Unless you have a business model to support this, then you have a company that's doomed from the start.
rssCloud reminds me of PinGoat and Ping-O-Matic. You send them a ping and they broadcast it to everyone else. I haven't heard much from either for the last while. Does anybody actually use these ping distribution services anymore? I stopped because I realized they didn't work. PinGoat would get 10,000 pings a minute and send 100,000 pings. Imagine that server load. In the end, these services did nothing, because they were too overwhelmed to do anything.
Imagine if one million Wordpress blogs started pinging, millions of users started subscribing and unsubscribing and notifications started getting sent all over the place. It would be scalability hell all over again. In fact, it would be worse. It wouldn't be just some ping servers bashing each other, it would be all the rss clients all over the blogosphere bashing the crap out of these rssCloud services. Open my laptop, my laptop registers hundreds of feeds. Close my laptop. The notifications fail and these servers start timing out like mad. Open my laptop. Does my laptop re-register? I likely got unsubscribed because of all the timeouts. I open and close my laptop a dozen times per day. No matter what, you have to re-register once per day. rssCloud needs a business model. Badly.
Here's why Scoble is wrong. RSS is the only standardize and distributed way of doing push over HTTP without ridiculous scalling issues. Consider Atom a flavor of RSS for this discussion.
What people hate about RSS is that it's neither real-time, nor is it really push. RSS is really just a polling pull that simulates push. That is, publishers don't send their items to readers, rather the RSS clients query the publishers for updates at some interval.
Everybody wants push. Real push. Not this simulated polling crap. The problem is that HTTP is connection oriented and not very friendly for servers trying to connect to clients. Ten years ago, server could connect to a client, send a virus and then the East Coast power grid would fail. Hello firewall.
Further NATs have become popular way of using one IP address to service an office building full of clients. How do you address a client whose true IP address is local to his office? It's not possible unless you assign inbound NAT holes and then someone closes down the East Coast power grid again. Some companies do allow inbound NAT holes. Most bigcos don't.
Until someone invents another solution, we are stuck with solutions where clients connect to servers. At least at the retail level.
Twitter. Twitter is not the Web. It's based primarily on SMS, although you don't need SMS to use it. But, the immediacy of Twitter comes from SMS. The real time Twitter Web cients are scalling disasters. That's why every complains about Twitter failing. In fact, even Twitter over SMS fails, but you don't see it because it's push. They don't push errors. When you are on the Web, they respond to your poll with errors. I get them several times per day. Maybe Dick Costolo can change that. Doubt it. He's good, but he's no god.
So, we have RSS. I still don't see another solution. Thanks Dave!
Typed on phone. Don't grammar or spellcheck. It's also an incomplete thought.
There's great debate in the RSS intellectual community about the merits of Dave's all new rssCloud. Rogers Cadenhead wrote a piece called There's a Reason RSSCloud failed to Catch On. I necessary read. Then Mark Woodman wrote Is rssCloud All Wet? Another necessary read. Both Rogers and Mark made valid points why rssCloud cannot succeed. Dave responded with his own rebutle 2002 != 2009. Specially he says "We had problems, but I've factored in what we learned in 2002 in the 2009 implementation." He doesn't mention anywhere how he conquered the problems of 2002. I also looked thru the Implementor's Guide to rssCloud and couldn't find anything that address the issues raised by Rogers and Mark. Maybe Dave and tell us mundanes what we are missing. I don't see it.
http://workbench.cadenhead.org/news/3555/theres-reason-rsscloud-failed-catch
http://techbrew.net/articles/200909/rsscloud-all-wet/
http://www.scripting.com/stories/2009/09/08/20022009.html
http://rsscloud.org/walkthrough.html
Dave Winer tells us that Wordpress' millions of blogs now support his rssCloud mechanism. The only client that supposedly supports this is Dave's River2. We'll see if other readers jump on board. I'm consider support in Reblinks.
http://www.scripting.com/stories/2009/09/07/teaseTeaseTease.html
Dave Winer has stated that FeedBurner failed, with the caveat "Luckily they got Google to give them $100 mill before the house of cards collapsed." Quite a caveat. Even without the $100M, I still don't see failure. FeedBurner is still running. It's still pushing millions of feeds and many of the most popular blog feeds on the Internet. They've been re-branded Google FeedBurner and the FeedBurner Ad Network was merged with AdSense for Feeds. Where's the failure? If FeedBurner is a failure, then what would you call Userland? Does anybody use that anymore?
When you sell your baby to bigco's like Google, they get gobbled up into their infrastructure. Plans change. Branding changes. You don't here much about FeedBurner anymore because it's now just a piece of a bigger machine called Google. I don't think Google has any intent on closing down FeedBurner feeds. Where's the failure?
Interesting timing considering that Twitter has hired x-FeedBurner CEO Dick Costolo. Does Dave have something against Mr Costolo? That's a clever move for Twitter. FeedBurner wasn't Dick's first success. He and his team start-up SpyOnIt back in the 90s. They were purchased by 724 Solutions for $53M. Dick worked for 724 for awhile until his team left and started FeedBurner. I was one of the original employees at 724 Solutions. When I found out we bought SpyOnIt, I was extremely happy. It was an awesome technology. I can't wait to see what Dick does with Twitter. Put it thru the roof.
http://www.scripting.com/stories/2009/09/05/rssHasNoFailWhale.html
Reblinks sent 10,000 emails in each of the first three days of this month. WOOT! I also scooped RssEmail.com last week. Hopefully, I'll get my ass in gear and pump up the features. Daily digests. RSS to Twitter. RSS to MetaWeblogAPI. This is gonna be awesome. I just hope Dave Winer will be talking about how Reblinks failed 5 years from now when I sell it to Google for $100M.
According to Robert Scoble, RSS is dead yet again. This is getting boring. I guess if he predicts RSS is dead every 3 months for the rest of time, he will be eventually correct. I think this started when he told us that RSS doesn't scale several years ago.
In the age of 140 character micro content, URL shortening has become a necessity. Unfortunately, there's no business model behind them. WTF? Not again. As such Tr.im has announced it will be terminating it's services later this year. Yikes? Not really. At least not for most of us. How many people actually search older Twitter posts for gold? Not anybody I know. So older Twitter posts are broken, most of which will never see daylight.
Scoble and Dave Winer see this as a short-coming of Twitter. I'm not as concerned.
http://scobleizer.com/2009/08/10/twitters-platform-shortcomings/
http://www.scripting.com/stories/2009/08/10/enoughWithShortenedUrls.html
Annoucement
http://blog.tr.im/post/159369789/tr-im-r-i-p
I was quite surprised to find Facebook had bought FriendFeed (for a reported $50 millions). In fact, it was my opinion that these two companies have competitive user features that don't really create synergies, but rather overlap. Personally, I see this more as a defensive more. Twitter plus FriendFeed could challenge Facebook's social Web dominance.
BTW, FriendFeed should be sending a cut of that $50 million to Robert Scoble. There's no way a geeky social Website is worth $50 million. Just another example of the Scoble Factor. Anybody here heard of Technorati? Six Apart?
Read Dave Winer for more on the Scoble Factor. Dave argues that the acquisition isn't user friendly. Interesting point. Personally, I always saw FriendFeed as a very geeky site. Only my fellow geeks use it. Non-geeks have never heard of it.
http://www.scripting.com/stories/2009/08/10/scobleYourBlogStillLovesYo.html
More...
Yahoo! has announced a new version of the Media RSS specification. It looks like they added a lot of new elements to support Flickr. With the recent problems, I think this spec needs a twice-over. Please review and provide them feedback.
Announcement
http://tech.groups.yahoo.com/group/rss-media/message/1199
Ryan Parman has proposed a solution to the recurring problems with the media RSS specification. He proposes to transfer ownership of the specification to the RSS Advisory Board. I think this is a good idea. The rssboard.org website is very static and not subject to corporate decisions. What do you think?
Read more...
http://tech.groups.yahoo.com/group/rss-public/message/1915
Last week, Yahoo! once again screwed with the Media RSS specification. They've since apologized and fixed the problem, although I haven't investigated if any of my work was and still is broken. I wonder if Yahoo! would consider handing over the spec to the RSS Advisory Board.
Reblinks has been sending 5-7 thousand emails per day this last week. Near doubling from last months 3-4 thousand during the same week of the month. I can't remember exactly how many RMail sent at it's peak, but I believe it was around 70,000 daily. That's not too far off. And I just launched Reblinks a few months back.
http://gigaom.com/2009/07/03/maybe-paid-is-the-future-of-online-business/
I'll reproduce the Q&A between Blake and myself here.
My answer...
Is it possible for online content producers to charge a nominal fee
for RSS? Yes! In fact, this can be done today, by simply
authenticating the RSS URL. Unfortunately, the implementation is
extremely problematic.
First, many, possibly most RSS clients cannot handle authenticated RSS. My Reblinks surely doesn't, mainly because it's a security nightmare. Accidentally pulishing credentials could create an undesirable liability.
Second, nothing would stop these credentials from being used by multiple users. This would prove impossible to control for the both the publisher and aggregator. If two users subscribed via Reblinks to the same feed using the same credentials, the publisher would have little clue to this violations, since the aggregator need not present a different behavior than if only one user was subscribedd. Further, what happens when two users susbscribe to an aggregator with different credentials? Do you poll once or twice?
Although feasible, the implementation is uncontrollable. Is there another possible implementation? Certainly! And it's already in wide use.
The solution is to provide a public unauthenticated feed with summary content. This is actively being done by hundreds, if not thousands of existing pulishers. The user is presented with a brief summary and link that entices them to click thru. When the user clicks thru the link, he is requested to authenticate. Without existing credentials, he pays a subscription fee to view the full content. Of course, the content has to be very compelling if the user is gonna give you his PayPal or Visa numbers.
My answer...
Unfortunately, it won't always be two people sharing one subscription. It'll quite often be dozens, possibly hundreds. Paid RSS subscriptions doesn't seem feasible. But as I mentioned, partial feeds that entice clicks and subscriptions are likely the desired solution.
FriendFeed has released a v2 of their API. Check out Scoble and Dave Winer's conversations. I always considered FriendFeed a nuissance. It seem to get popular fast and disappear even faster. Now, I rarely see it anymore. Looks like a great API, but I've never understood the usefulness of the website.
http://friendfeed.com/api/documentation
http://blog.friendfeed.com/2009/07/friendfeed-api-v2-real-time-oauth-file.html
http://scobleizer.com/2009/07/20/what-does-loic-think-of-the-new-friendfeed-20-api/
http://friendfeed.com/davew/a7ba6187/friendfeed-api-v2?embed=1