Some one has to say it again…

September 30, 2005 | Rants and Raves | By: Mark VandeWettering

James Robertson doesn’t much like OPML or RSS as file formats, and tells us why:

Ye gods, it’s time someone came out and said something. OPML is a really, really crappy format. Really crappy. I had massive headaches implementing OPML support for import/export in BottomFeeder. Why? Because there’s no real specification. Like everything Dave Winer has ever been involved with, the specs are all in his head, and it’s up to the rest of us to figure out wtf he actually meant. Here’s the “spec” – and look at all the meaningless crap in it (windowRight? Why is there something specifying the number of pixels for the margin?).

I had to add tons of hacks to the OPML support in order to support the export formats of various tools. The problem? Everyone implemented it a little differently, because the spec is incredibly unspecific – about just about everything.

I couldn’t agree more. Take for example Mark Pilgrim’s comments:

I just tested the 59 RSS feeds I subscribe to in my news aggregator; 5 were not well-formed XML. 2 of these were due to unescaped ampersands; 2 were illegal high-bit characters; and then there’s The Register (RSS), which publishes a feed with such a wide variety of problems that it’s typically well-formed only two days each month. (I actually tracked it for a month once to test this. 28 days off; 2 days on.) I also just tested the 100 most recently updated RSS feeds listed on blo.gs (a weblog tracking site); 14 were not well-formed XML.

The reason just isn’t that programmers are lazy (we are, but we also like stuff to work). The fact is that the specification itself is ambiguous and weak enough that nobody really knows what it means. As a result, there are all sorts of flavors of RSS out there, and parsing them is a big hassle.

The promise of XML was that you could ignore the format and manipulate data using standard off-the-shelf-tools. But that promise is largely negated by the ambiguity in the specification, which results in ill-formed RSS feeds, which cannot be parsed by standard XML feeds. Since Dave Winer himself managed to get it wrong as late as the date of the above article (probably due to an error that I myself have done, cutting and pasting unsafe text into WordPress) we really can’t say that it’s because people don’t understand the specification unless we are willing to state that Dave himself doesn’t understand the specification.

Here is another small example: there is genuine confusion to this day about the support for the enclosure tag. Are you allowed to have more than one per item or not? People do generate them. By default, WordPress creates enlosure links for every mp3 that you link to in a post. It’s probably wrong, but lots of things like it just fine. Occasionally someone complains and asks for clarification, but no one ever really reaches a definitive answer.

Scoble likes to champion first RSS and now OPML under the claim that they are good for users. What would be good for users is for the deficiencies of these formats to be absent, or at least invisible. They are not. They manifest themselves in all sorts of edge cases which prevent interoperability. I’ve spent a great deal of time reading RFCs for various networking protocols and formats, and by comparison the RSS and OPML “specifications” are scribbles on napkins.

Scoble’s attitude reflects what I think of as the Microsoft way: it doesn’t matter what’s underneath as long as what’s on top looks shiny. Sure, it will belch smoke, require servicing by a third party every three thousand miles and occasionally make strange sounds that will puzzle and worry the owner, but look how shiny it is.

I use RSS every day. It does fulfill a need. But it does suck, and we would be better off if we all recognized that and worked to something better.

Comments

Pingback from tech.memeorandum
Time 10/1/2005 at 1:40 pm

Some one has to say it again… James Robertson doesn’t much like OPML or RSS as file formats, and tells us why: … I couldn’t agree more. Take for example Mark Pilgrim’s comments: … The reason just isn’t that programmers are lazy (we are, but we also like stuff to work).

Pingback from Dare Obasanjo aka Carnage4Life
Time 9/30/2005 at 7:14 pm

More on crappy formats Robert defends OPML. I’ve seen some really poor arguments made as people rushed to bash Dave Winer and OPML but  none made me want to join the discussion until this morning. In the post Some one has to say it again…  brainwagon writes Take for example Mark Pilgrim’s comments : I just tested the 59 RSS feeds I subscribe to in my news aggregator; 5 were not well-formed XML. 2 of these were due to unescaped ampersands; 2 were illegal high-bit characters;

Comment from Jonathan
Time 10/2/2005 at 1:12 pm

I agree with you as well and I just try to read blogs. I am one of those users that is frustrated w/ sites showing up one day on a blog service and failing on a next.
RSS is great when it works and sucks big time when it doesn’t. Get the spec right and not just in someone’s head.

Pingback from OPML: Threat or Menace? » LibraryPlanet.com
Time 2/22/2006 at 1:24 pm

[…] The promise of XML was that you could ignore the format and manipulate data using standard off-the-shelf-tools. But that promise is largely negated by the ambiguity in the specification, which results in ill-formed RSS feeds, which cannot be parsed by standard XML feeds. Since Dave Winer himself managed to get it wrong as late as the date of the above article (probably due to an error that I myself have done, cutting and pasting unsafe text into WordPress) we really can’t say that it’s because people don’t understand the specification unless we are willing to state that Dave himself doesn’t understand the specification. – Mark VandeWettering […]