Google on Well-Formed XML - The RSS Blog
RSS, OPML and the XML platform.
Copyright 2003-5 Randy Charles Morin
The RSS Blog
<< Previous Main Next >>
Fri, 23 Dec 2005 17:31:49 GMT
Google on Well-Formed XML

Mihai Parparita: Here are the top XML errors that we have encountered when parsing all of the feeds that our users have added to Reader.

% of errors Error description
15.6% Input claims to be UTF-8 but contains invalid characters.
14.9% Opening and ending tags mismatch
13.9% An undefined entity is used (e.g.   in an XML document without importing the HTML set)
7.8% Documented expected to begin with a start tag, but no < was found
5.7% Disallowed control characters present
5.5% Extra content at the end of the document
4.2% Unterminated entity reference (missing semi-colon)
4.2% Unquoted attribute value
3.8% Premature end of data in tag (truncated feed)
3.3% Naked ampersand (should be represented as &)
2.1% XML declaration allowed only at the start of the document
1.8% Namespace prefix is used but not defined
0.75% Comment not terminated
0.64% Attribute without value

Randy: Some interesting data would be the percentage chance that a feed has ill-formed XML based on the generator (Blogger, Wordpress, Typepad, MT, etc). Anybody got that data?

Reader Comments Subscribe
sorry, obviously I'm a dunce with html, forgot the quotes and the tag.  grrr...  sorry bout' that.
Sorry Robyn, I don't usually allow HTML at all.

Type "339":
Top Articles
  1. Unblock MySpace
  2. MySpace
  3. FaceParty, the British MySpace
  4. and
  5. Blocking Facebook and MySpace
  1. Review of RSS Readers
  2. MySpace Layouts
  3. RSS Stock Ticker
  4. RSS Gets an Enema
  5. Google Reader rejects