Fri, 23 Dec 2005 17:31:49 GMT
Google on Well-Formed XML

Mihai Parparita: Here are the top XML errors that we have encountered when parsing all of the feeds that our users have added to Reader.

% of errors Error description
15.6% Input claims to be UTF-8 but contains invalid characters.
14.9% Opening and ending tags mismatch
13.9% An undefined entity is used (e.g.   in an XML document without importing the HTML set)
7.8% Documented expected to begin with a start tag, but no < was found
5.7% Disallowed control characters present
5.5% Extra content at the end of the document
4.2% Unterminated entity reference (missing semi-colon)
4.2% Unquoted attribute value
3.8% Premature end of data in tag (truncated feed)
3.3% Naked ampersand (should be represented as &)
2.1% XML declaration allowed only at the start of the document
1.8% Namespace prefix is used but not defined
0.75% Comment not terminated
0.64% Attribute without value

Randy: Some interesting data would be the percentage chance that a feed has ill-formed XML based on the generator (Blogger, Wordpress, Typepad, MT, etc). Anybody got that data?

sorry, obviously I'm a dunce with html, forgot the quotes and the tag.  grrr...  sorry bout' that.
Sorry Robyn, I don't usually allow HTML at all.

Type "339":
