Saturday, September 13, 2008

DNN News Module Too Strict?

by Phil 'iwonder' Guerra

One thing that is difficult to manage is the aggregation of feeds without standards. Unfortunately, the strict enforcement of RSS v2.0 implementation is more for the aggregation rather than just the display of a single feed. At this point, I believe the DNN News module is being too strict, as most newsfeed sources will never all comply with the specs for any version of the RSS or Atom specs. One of the more common errors is in the display of date of publication, which is not even a standard element between RSS v2.0, RSS v1.0 (RDF), and Atom feeds. Also, the date formatting is totally out of whack for for a lot of feeds, even when somethings gives GMT as its' base, you cannot rely on the develper actually giving you that correctly.

Other troublesome elements are those that are supposed to include and email in the content, but only provide a name. This is a common error emitted for a lot of popular newsfeeds, but one easily avoided with including the Dublin Core namespace in a feed, and using one of the 3 included metadata elements that do not require email:

- creator
- publisher
- contributor

There are other DC elements that are useful, as well. In fact many newsfeeds incorporate them as extensions to whatever version RSS they emit.

The other option is for the DNN module to include the Dublin Core namespace and use one of the above elements in place of the RSS v2.0 author element when an email format fails, and always provide the associated DC element with the email stripped when one is present. In this way the feed validates and can be used without penalty.

With the apps I write for desktop, I just take this information without trying to validate it or use it for anything other than displaying it as given. If I can't make sense of a date element in a feed, I find it easier to fashion a date for sorting, or caching, which I take from the time I first bring in the element. It's not going to make a whole lot of difference in the presentation anyway IMHO. The main usage for me of having a pubDate is for sorting, and accuracy in attribution of the news source.

It's mostly a matter of lack of attention by developers writing the feed generators. We see it in the DNN modules as well. The whole point of specs is to keep folks from having to worry about issue with code that is supposed to be compliant. Of course, that's not what we see in the real information world. If auto makers disregarded safety specifications as much as developers, we'ld have pulled the offenders off the market. Unfortunately, there is no similar enforcement in our IS world.

I think going into this update, most didn't realize the number of newsources that were not compliant. If something says it's RSSv2.0 then it should be expected to be a valid feed. There's the trouble, even larger news providers don't offer compliant RSS, so it's something the industry just avoids by coding around it. In the end, that's probably what will have to be done in the module, because aggregation should not cause aggrevation.

No comments:

Post a Comment