RSS, OPML and the XML platform.
Copyright 2012 World Readable
I thought I'd prepare a blog entry describing the differences between PubSubHubbub and rssCloud. I'm doing this mostly for myself, as I'm currently implementing a desktop client based entirely on PubSubHubbub and rssCloud. My goal is to solve the NAT traversal problem using long polling thru a notification gateway. It's not the optimal solution, but maybe we can add a notification gateway to PubSubHubbub and/or rssCloud and make them work behind NATs and firewalls.
Both protocols are classic publish and subscribe. Publishers have a relationship with a hub. Clients subscribe to a hub. Publishers send updates to the hub. The hub pushes notifications out to the Clients. Nobody invented anything here, we've being doing this in computer science for a long time. Let's examine each of the four major interactions within the system; subscribing, unsubscribing, pinging and notification.
Subscribing is when a client tells the hub that it wishes to receive notification from one of its publishers. In this case, this might be a Web-based RSS aggregator (Google Reader) or a desktop RSS client (Feed Demon). In both cases, the client has polled an RSS or Atom feed and discovers that the feed has PubSubHubbub or rssCloud notification support. The client uses the information within the feed to send a simple HTTP request to the hub with varying parameters to setup the subscription. Very similar.
There are a couple small differences. PubSubHubbub only supports their REST API, while rssCloud supports all of XML-RPC, SOAP and REST. This would make rssCloud slightly more difficult to implement, as you have to account for three possible transports. Another difference is that rssCloud does not specify the target IP address of the client. Rather, it is assumed that the host of the request is also the notification target. You'll see later on that this makes implementing a notification gateway more difficult. The rssCloud protocol may include a parameter to allow passisng of the notification target's IP address in the near future.
There is one big difference. Because PubSubHubbub allows the subscriber to specification the notification ended, there is a greater possibility of malicious hackers or code subscribing a notification end-point against it's will. PubSubHubbub follows up all subscribing and unsubscribing requests by verifying with the client that their intent was true. The adds additional, but required complexity to PubSubHubbub. The rssCloud protocol may include subscriber verification in the near future.
Unsubscribing is when a client stops receiving notifications. With PubSubHubbub, unsubscribing involves the client sending a unsubscribe request to the server. With rssCloud, there is no request. Rather all subscriptions are automatically dropped after 24 hours. Don't think any of the two techniques are better than the other, there are advantages and disadvantages to both.
First off, I don't know anybody's software that is smart enough to unsubscribe when you close your laptop. Second, what happens when my laptop is closed and the hub is trying to send notifications? Are the notifications queued? How many failures before you unsubscribe the misbehaved (not really) client. Neither protocols is air tight and neither addresses numerous scenarios that arise frequently in homes and offices all across this Internet-enabled planet.
The ping component of both services are very similar. Both support a REST ping.
In addition to a basic REST ping, rssCloud allows the publisher to ping the cloud with all of REST, XML-RPC and SOAP. There doesn't appear to be a discover mechanism that tells publishers which of the protocols are accepted by the cloud, but this shouldn't be a much of a problem, since discover can occur via trial and error and the REST ping is likely to be supported by all rssClouds.
rssCloud does provide an additional lightly specificied interface that pushes the RSS feed to the cloud, allowing the cloud to host the RSS feed on behalf of the publisher. I highly doubt this would be widely used my many, unless the cloud implements more feed hosting services.
Notification is likely the biggest difference between rssCloud and PubSubHubbub.
rssCloud again allows for REST, XML-RPC and SOAP packages. This greatly increases the complexity of the cloud. The rssCloud notification is effectively a reverse ping, where the cloud ping the subscriber to tell it to fetch the feed and find out what changed.
PubSubHubbub implements a much more complex notificaton. It's not a simple ping, but rather an POSTed XML package contain the feed that has been updated, but with only the entries of the feed that are new or that have been updated. This creates an enormous amount of state problems within the hub. What happens when the previous ping failed? Do you send multiple updates in the next ping? This could mean sending different notifications to the subscribers or subscribers missing new and updated entries. On the other hand, this will avoid flooding the publisher with simulaneous feed fetches from all the subscribers who've been notified of the change. Neither approach is optimal, neither is horrible.
Both protocols have a major failing, in that they rely on servers connecting to subscribing clients. If a client exists behind an unfriendly NAT or firewall, then the protocols simply fail. You can implement UPnP and other protocols and break your way thru some NATs and firewalls, but the problem will still exist on a large piece of the Internet pie.
Long polling is the solution to the NAT and firewall problem. Long polling is not the optimal solution to the notification problem because long polling involves holding open connections from the client to the server. This means 10,000 clients will hold 10,000 connections open. Yikes! The real solution is to detect the capabilities of the client and use direct notification where possible, UPnP where possible and long polling where everything else fails. It might be difficult to convince a developer of rssCloud subscribing software to implement UPnP when he can simply resort to long polling from the beginning.
One great advantage of PubSubHubbub is that it is tightly specified with lots of examples and code. rssCloud on the other hand is very loosely specified with pieces of code and text found in various interlinked documents across many websites.
Please submit corrections in my comments or via email (firstname.lastname@example.org) where this document is incorrect. Thanks!
If you are looking for new web hosting, then Webhostinggeeks.com is the best place to start. Webhostinggeeks is a website dedicated to informing webmasters about the various web hosting services, allowing them to make the best decisions in chosing a Web hosting service. On their homepage, you get the big picture with their Top 10 Web Hosting, Best Web Hosts of 2009. They pic the top ten hosting services and provide detailed reviews of each, including pricing bandwidth, storage, bonuses, technical support, average user rankings and much much more.
Beyond simply the top Web hosts, they also provide reviews of Web hosting services in specific categories; multiple domain hosting, green Web hosting, vps hosting, dedicated server hosting, free domain names, free Yahoo! marketing and free Google Adwords. So, if you have these specific needs, then you can drill down directly to the hosting services that meet your needs.
The web host reviews on Webhostinggeeks.com also provides a section where users can submit their own rankings and reviews, and read those left by others. Some of the web hosts have more than a 100 reviews with a lot of great information. Many of the reviews are from review websites, but others are from geeks like you and me.
Webhostinggeeks also provides awards in various other categories; best budget hosting, best blog hosting, best forum hosting, best UNIX hosting, best Windows hosting, best PHP hosting, best email hosting, best ecommerce hosting, best multiple domain hosting, best vps hosting, best reseller hosting and best dedicated hosting.
Don't forget to check out their blog and subscribe to their blog feed. You can also subscribe to their blog using Reblinks to get emails everytime the blog is updated. Their blog is updated several days per week with new articles dedicated to helping webmasters make the best decisions when choosing web hosts. The blog entries include industry news, trends, products and discussions.
This is a paid review.
RSS is great. That said, it isn't real-time. With the advent of Twitter, people are now beginning to wonder why RSS needs to poll every hour. Why can't we have the immediacy of Twitter in RSS. Time for a history lesson.
Why is RSS here? It's a pretty simple and stupid protocol based on polling to simulate push. Why didn't we just create a real push protocol? Why? You can't. Push doesn't work very well on the Internet. Some people will point you to email and instant messaging as push technologies that actually work on the Internet. Unfortunately, they are wrong.
Are emails pushed? Yes. They are pushed around the Internet between SMTP gateways and eventually land themselves in an inbox. An inbox. Not your desktop. An inbox sitting on a server somewhere. Then we opened our POP3 client and it pulled the emails back down to our email client. Or maybe we open a browser and pull the data down to our Web browser. Either way, email isn't only push, it's push with a bit of pull on the end.
Are instant messages pushed? Sometimes. But even your Instant Messaging clients will fall back to clients connecting to servers using long polling, when push oriented connects from the server to the client fail.
You can't push. You can only poll. You can only pull. That's why push-based publish and subscribe technology failed in the 90s. That's why polling oriented technologies like RSS ruled the world for the last decade.
For nearly a decade, we were happy with the RSS solution giving us updates as late as an hour after publication. Life was great. Engineers weren't happy. Engineers hate polling. They want push. They want real-time RSS. They invented rssCloud. It died. Yes, rssCloud was invented a long time ago. They invented ping servers. Ping servers died. Then someone created Twitter, a centralized publish and subscribe service for micro-content. Geeks were in awe of the real-time immediacy of Twitter. Geeks were not happy about the centralized nature of Twitter. Geeks want the immediacy of Twitter and the decentralization of RSS.
Some of the engineers slash geeks that wanted real-time RSS worked at Google and they got together with the team at Google Reader and wrote PubSubHubbub. I think it was released in July of this year, but it may have been earlier. When I first saw PubSubHubbub in early July, I wondered how it was any different than previously failed pushing technologies (rssCloud specifically). If you read the spec, then PubSubHubbub is basically a copy of the rssCloud spec with some additional features meant to optimize, but that made it more complex and more difficult to implement than rssCloud.
I don't think Dave Winer was too impressed with PubSubHubbub, a knock off of his pre-existing and failed rssCloud technology. As such, Dave restarted the rssCloud movement and here we are today. Two technologies that are no different than everything that failed before it.
Who will win? Both? Neither? Someone? Somebody else?
If the notifications don't come streaming in, then your rssCloud service failed to automatically detect that the client was offline and queue the notifications. Or maybe the rssCloud didn't automatically detect that the client changed IP addresses. Or possibly there's a NAT at Starbucks that has to be configured to allow connections to the client from the server. Or maybe there's a firewall at Starbucks and your rssCloud service didn't automatically call starbucks and ask them to disable the firewall. Or possibly your software did call Starbucks and they didn't think the request was reasonable.
If you can get this scenario to work, then tell your son Jesus Hi for me.
Now call up your corporate IT and ask them to open an incoming port for you, because you need rssCloud to work. If they deny this, then don't worry, it's not your software that's broken, it's the rssCoud protocol that doesn't work behind corporate firewalls.
If she does know how to configure the wireless router, then you fail anyways because the intent of the test was to see if a mundane user behind a NAT could use an rssCloud client and your wife is a geek, not a mundane.
This will make rssCloud easy to scale, since most users will not be able to use and only the few geeks capable of pushing load on the servers. The designers of rssCloud were thinking scalability when it was written.
Typed on cellphone, please excuse typos.
Do you remember 3 years ago? We were all pinging several dozens ping servers everytime we updated our blogs. We pinged Technorati, PubSub, PinGoat, Ping-O-Matic, etc. I wrote extensively about the Blogosphere Ping infrastructure at the time. Read more at the next link.
I wrote about how it didn't work. Companies that relied on the ping, like Technorati and PubSub have stagnated and disappeared. That's because ping infrastructure requires big walls of servers and that costs a lot of money. Unless you have a business model to support this, then you have a company that's doomed from the start.
rssCloud reminds me of PinGoat and Ping-O-Matic. You send them a ping and they broadcast it to everyone else. I haven't heard much from either for the last while. Does anybody actually use these ping distribution services anymore? I stopped because I realized they didn't work. PinGoat would get 10,000 pings a minute and send 100,000 pings. Imagine that server load. In the end, these services did nothing, because they were too overwhelmed to do anything.
Imagine if one million Wordpress blogs started pinging, millions of users started subscribing and unsubscribing and notifications started getting sent all over the place. It would be scalability hell all over again. In fact, it would be worse. It wouldn't be just some ping servers bashing each other, it would be all the rss clients all over the blogosphere bashing the crap out of these rssCloud services. Open my laptop, my laptop registers hundreds of feeds. Close my laptop. The notifications fail and these servers start timing out like mad. Open my laptop. Does my laptop re-register? I likely got unsubscribed because of all the timeouts. I open and close my laptop a dozen times per day. No matter what, you have to re-register once per day. rssCloud needs a business model. Badly.
Here's why Scoble is wrong. RSS is the only standardize and distributed way of doing push over HTTP without ridiculous scalling issues. Consider Atom a flavor of RSS for this discussion.
What people hate about RSS is that it's neither real-time, nor is it really push. RSS is really just a polling pull that simulates push. That is, publishers don't send their items to readers, rather the RSS clients query the publishers for updates at some interval.
Everybody wants push. Real push. Not this simulated polling crap. The problem is that HTTP is connection oriented and not very friendly for servers trying to connect to clients. Ten years ago, server could connect to a client, send a virus and then the East Coast power grid would fail. Hello firewall.
Further NATs have become popular way of using one IP address to service an office building full of clients. How do you address a client whose true IP address is local to his office? It's not possible unless you assign inbound NAT holes and then someone closes down the East Coast power grid again. Some companies do allow inbound NAT holes. Most bigcos don't.
Until someone invents another solution, we are stuck with solutions where clients connect to servers. At least at the retail level.
Twitter. Twitter is not the Web. It's based primarily on SMS, although you don't need SMS to use it. But, the immediacy of Twitter comes from SMS. The real time Twitter Web cients are scalling disasters. That's why every complains about Twitter failing. In fact, even Twitter over SMS fails, but you don't see it because it's push. They don't push errors. When you are on the Web, they respond to your poll with errors. I get them several times per day. Maybe Dick Costolo can change that. Doubt it. He's good, but he's no god.
So, we have RSS. I still don't see another solution. Thanks Dave!
Typed on phone. Don't grammar or spellcheck. It's also an incomplete thought.
There's great debate in the RSS intellectual community about the merits of Dave's all new rssCloud. Rogers Cadenhead wrote a piece called There's a Reason RSSCloud failed to Catch On. I necessary read. Then Mark Woodman wrote Is rssCloud All Wet? Another necessary read. Both Rogers and Mark made valid points why rssCloud cannot succeed. Dave responded with his own rebutle 2002 != 2009. Specially he says "We had problems, but I've factored in what we learned in 2002 in the 2009 implementation." He doesn't mention anywhere how he conquered the problems of 2002. I also looked thru the Implementor's Guide to rssCloud and couldn't find anything that address the issues raised by Rogers and Mark. Maybe Dave and tell us mundanes what we are missing. I don't see it.
Dave Winer tells us that Wordpress' millions of blogs now support his rssCloud mechanism. The only client that supposedly supports this is Dave's River2. We'll see if other readers jump on board. I'm consider support in Reblinks.
Dave Winer has stated that FeedBurner failed, with the caveat "Luckily they got Google to give them $100 mill before the house of cards collapsed." Quite a caveat. Even without the $100M, I still don't see failure. FeedBurner is still running. It's still pushing millions of feeds and many of the most popular blog feeds on the Internet. They've been re-branded Google FeedBurner and the FeedBurner Ad Network was merged with AdSense for Feeds. Where's the failure? If FeedBurner is a failure, then what would you call Userland? Does anybody use that anymore?
When you sell your baby to bigco's like Google, they get gobbled up into their infrastructure. Plans change. Branding changes. You don't here much about FeedBurner anymore because it's now just a piece of a bigger machine called Google. I don't think Google has any intent on closing down FeedBurner feeds. Where's the failure?
Interesting timing considering that Twitter has hired x-FeedBurner CEO Dick Costolo. Does Dave have something against Mr Costolo? That's a clever move for Twitter. FeedBurner wasn't Dick's first success. He and his team start-up SpyOnIt back in the 90s. They were purchased by 724 Solutions for $53M. Dick worked for 724 for awhile until his team left and started FeedBurner. I was one of the original employees at 724 Solutions. When I found out we bought SpyOnIt, I was extremely happy. It was an awesome technology. I can't wait to see what Dick does with Twitter. Put it thru the roof.
Reblinks sent 10,000 emails in each of the first three days of this month. WOOT! I also scooped RssEmail.com last week. Hopefully, I'll get my ass in gear and pump up the features. Daily digests. RSS to Twitter. RSS to MetaWeblogAPI. This is gonna be awesome. I just hope Dave Winer will be talking about how Reblinks failed 5 years from now when I sell it to Google for $100M.