State of Blogosphere Search, Part III - The RSS Blog
RSS, OPML and the XML platform.
Copyright 2003-5 Randy Charles Morin
The RSS Blog
<< Previous Main Next >>
Sat, 22 Apr 2006 21:47:21 GMT
State of Blogosphere Search, Part III

Scott Karp of Publishing 2.0 noticed that Technorati results were changing radically, that Dave Winer was nowhere to be found and was being replaced by MSN Spaces blogs. The fear, of course, is that Technorati was compromised by SERP (search engine result page) spammers. Chris Edwards followed up with some actual analysis and determined that it wasn't SERP spam, but rather some lucky MSN Spaces bloggers that snuck thru a hole in Technorati's engine and other very explainable occurrences. In other words, Technorati is not compromised by rather broken, but this isn't a new thing. Dare Obasanjo noticed a bunch of MSN Spaces in the Technorati index over a month ago. Similar occurrences have also been noted more than two months ago and as Domiziano Galia explained, in the comments for The RSS Blog, "every blog on MSN Spaces has a box where are shown links to other MSN Spaces blogs. This is not a choice from the blogger, but an automatism of the platform, so that every blog gets free unjustified links."

The problem is that Technorati does not index blog entries, but rather webpages. Technorati often reports referrers to my own blogs where the blogger is simply listing me in his sidebar blogroll. Actually, it wouldn't be so bad if I got one referrer from that sidebar link, but Technorati will often repeatedly give me referrers every time that blogger writes a new entry. So, we have a big problem. Is MSN Spaces gaming the Technorati index? Guess what, they are not. MSN Spaces is marking these automated links with NOFOLLOW attributes, as they are suppose to. I then went to the Technorati 100 and checked if they were including links with this attribute and found out they were. Why is Technorati including NOFOLLOW links in their rankings? Technorati drafted a specification of the NOFOLLOW standard more than a year ago. More than six months ago, David Sifry wrote "Early this year, a number of search engines including Technorati adopted the rel='nofollow' microformat." David? Are you sure? Cause all the evidence indicates that Technorati is still ignoring NOFOLLOW attributes. David, what happened to "we have been battling the spam situation in a significant way for about 2 months?"

While I'm beating up on Technorati, let me also point out that World Live Web is no longer live and mostly cluttered with link spam. Let me show you an example. Every morning, I do a search for people linking to me. Here's a screenshot of what I see most mornings.

According to Technorati, Memeorandum linked to me 1 and 2 hours ago, but as you can see from the titles, the links are from several months ago. This is actually better than usual. Usually, the first few links are the overnight link spam attack on Blogspot.

Now, don't get me wrong, Technorati is actually not that much worse than the other blogosphere search engines, although they are clearly now the worst. Similar searches on Feedster give me nothing but smog (blogosphere spam) and posts from my own blogs. Bloglines Citations tends to be good at eliminating the smog, but more often than not, it reports an error and false positives. PubSub too is good at finding results, but also good at finding smog and often broken. BlogPulse has been better, but I've noticed more and more splogs in the results. IceRocket and Google Blog Search have the least smog and generally get me some good results. I'd say they rank #1 and #2, with IceRocket in the #1 spot based on Google being the heart of the splogosphere. I actually use IceRocket and Google Blog Search throughout the day, whereas I might hit the others once or twice on the oft chance they pickup a referrer not reported by IceRocket or Google.

Well, that's my rant. Guys pick up the slack!

Previous State of Blogosphere Search articles.

Reader Comments Subscribe
You seriously do this search every morning instead of just subscribing to the RSS feed for it?  I get old memeorandum posts in my search results as well,  but I imagine it's just that memeorandum gets a new URL every 5 minutes and it sometimes doesn't get indexed by a search engine until 10 minutes before I check my feeds.  Doesn't seem unreasonable to me.
I'm subscribed to 500+ other feeds as well. I might miss something, so I check the search engines as well. Call me anal.


Type "339":
Top Articles
  1. Unblock MySpace
  2. MySpace
  3. FaceParty, the British MySpace
  4. and
  5. Blocking Facebook and MySpace
  1. Review of RSS Readers
  2. MySpace Layouts
  3. RSS Stock Ticker
  4. RSS Gets an Enema
  5. Google Reader rejects