RSS, OPML and the XML platform.
Copyright 2003-5 Randy Charles Morin
The topic of the day, is Blogspot splogs. I gotta admit, my own RSS reader is giving me more SPAM than anything these days. Many in the blogosphere have pointed the finger at Google and want Blogspot taken down until they can fix the splog program. This is very short-sighted. You see, if Blogspot wasn't hosting these splogs, then somebody else would be. The reason sploggers have chosen Blogspot, is that it's a very popular blogging platform that supports an API. If not Blogspot, then sploggers would host on 21publish or Blogspirit or Blogware or Wordpress. Turning off Blogspot, might quell the splogs in the short-term, but the long-term plan has to involve the blogosphere search engines figuring out what is worth indexing and what is not. When the blogosphere search engines point their fingers at Blogspot, then they are simply playing a blame game. How hard would it be for the search engines to detect these splogs. Not very. It could be automated quite easily. If every blog entry contains the exact sample content pattern "(<b>(any*)</b><br/>(any*)<br/>)*, then you can be 99% sure it's blog SPAM.
At this point, I'm getting pretty tired of the lame blogosphere search engines. I really wish somebody would create one that actually worked. Beyond the constant easily detected SPAM that escapes them and the finger pointing that follows, none of the blogosphere search engines are all that good at capturing the blogosphere conversation and an HTTP 500 error isn't exactly out of the ordinary. Enough complaining, here's my report on the state of the blogosphere.
This stuff generally works.
Blogosphere Search Engines
This stuff generally doesn't work.
Bloglines is the best blogosophere search engine at capturing link data, but is absolutely horrible at capturing entries via keyword search and is down a good fraction of the time. The positive is that it's the only blogosphere search engine that report more than 50% of my inbound links. The negative is that the most common response from Bloglines is "There is a problem with the database. Please try again later" and the keyword search is simply broken. Try this, do a keyword search on Bloglines and make certain to sort by date. Now, scroll thru the entrie with attention on the dates. Note, they are not sorted by date. Further, the blog matches on common keywords overwhelm the results, making it a chore to page thru to the entry matches. Where's the RSS?
Technorati works for brief periods of time, but is broken more often than not. I don't know how many bugs I've filed with them and most remain unfixed. Recently, I noticed all my blogs stopped showing up in Technorati altogether. When I checked my profile, all the records of my blogs had been corrupted and I had to reclaim them all. Not the first time. When Technorati tag or keyword search are working, they are clearly the best, but unfortunately they work infrequently.
IceRocket is the pleasant surprise in the bunch. IceRocket is one of two search engines that almost always responds in less than one second (the other is Google). In fact, all the other search engines often respond in ten second or more. I find myself using IceRocket more and more, simply because I know I won't be frustrated and they consistantly report good results. That said, they are tracking much less than 50% of the blogosphere, which means you still have to compliment it with other search engines to find the majority of the results you are looking for. I think IceRocket's biggest problem is that not enough blog hosts are setup to ping IceRocket by default.
Google Blog Search
Google is the little brother that could grow up and become that blogosphere search engine that I always wanted. I can see the promise, but it's still not there yet. Like IceRocket, it responds fast, but tracks much less than 50% of the blogosphere.
Blogpulse, Blogdigger, PubSub
These blog search engines generally don't work. They fail to capture 80% of the data and report more bad data than good.