RSS/Atom feed consuming lib?

I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn't find anything that really
peaked my interest (although I might have missed something)

/Marcus

I was unable to find anything that really fit my needs either. I'm in the
process of hacking one together, but it's still a ways from being really
useful. You can check out FeedTools[1], it seems to have most of the
capabilities you're looking for. I wasn't able to use if for a few reasons,
but maybe it'll be helpful to you.

[1] http://sporkmonger.com/projects/feedtools/

···

On 10/18/06, Marcus Bristav <marcus.bristav@gmail.com> wrote:

I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn't find anything that really
peaked my interest (although I might have missed something)

/Marcus

--
===Tanner Burson===
tanner.burson@gmail.com
http://tannerburson.com <---Might even work one day...

Yes, syndication[1] and FeedTools[2] should be two of the better libraries.

[1]: http://rubyforge.org/projects/syndication/
[2]: http://sporkmonger.com/projects/feedtools/

HTH,
Jochen

···

Marcus Bristav <marcus.bristav@gmail.com> wrote:

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

Marcus Bristav wrote:

I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn't find anything that really
peaked my interest (although I might have missed something)

/Marcus

You may be interested in feed-normalizer; something I pieced together to wrap a few different Atom/RSS parsers. It outputs a normalized
object graph to represent a feed, regardless of the underlying feed format.

It currently wraps the Ruby RSS parser and Lucas Carlson's SimpleRSS, but it can be easily extended to support more parsers. Patches welcome.

http://feed-normalizer.rubyforge.org/

Hope that helps.

Andy

Thanks for the tips! I've tried feedtools and it seems to work nicely :slight_smile:

Out of curiosity: Why couldn't you use feedtools?

/Marcus

If you're planning to go through FeedBurner, you can checkout the plugin at
http://combustible.rubyforge.org/docs

Still very young, but perhaps you'll find it usefull

Gustav

···

Marcus Bristav <marcus.bristav@gmail.com> wrote:
  

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have...) and
exposes it with a nice common (for all formats) API?
    

--
about me:
My greatest achievement was when all the other
kids just learnt to count from 1 to 10,
i was counting (0..9)

   - gustav.paul

I am also working on a performance app that requires feed parsing. The
two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
feeds.

I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson's SimpleRSS? I am curious about Andy's feed normalizer.

HTH,
Ray

···

--
Posted via http://www.ruby-forum.com/.

I'm going to be parsing a LOT of feeds, but only for a few specific
elements. A few quick tests showed it would probably be too slow for what I
need it for. I hadn't seen syndicate, so I will definitely be checking it,
as it seems close to what I need. I could end up using feedtools for
generating feeds, but when it comes to consuming it's just got a bit too
much overhead for me.

···

On 10/18/06, Marcus Bristav <marcus.bristav@gmail.com> wrote:

Thanks for the tips! I've tried feedtools and it seems to work nicely :slight_smile:

Out of curiosity: Why couldn't you use feedtools?

/Marcus

--
===Tanner Burson===
tanner.burson@gmail.com
http://tannerburson.com <---Might even work one day...

Ray Chen wrote:

I am also working on a performance app that requires feed parsing.

As previously mentioned, feed-normalizer aims to produce a 'Feed' object that is independent of the underlying format. This means it will use each parser (in a user-defined order) until it gets back a successful parse and usable a object which to interface.

What this also means is that the *primary* goal of feed-normalizer is to produce the aforementioned Feed object graph. This might mean it hitting 3 parsers before it gets that result. So performance isn't really a consideration.

Of course, you could change the order of parsing so that feed-normalizer uses the fastest parser first, and so on. feed-normalizer currently uses most strict to most liberal as its default order. Right now, this just happens to be fastest parser first, too :slight_smile:

The two that I have tried are feedtools and syndication. First I tried feedtools for RSS and Atom, but that was too slow, so I switched to syndication for both RSS and Atom. I found syndication to break on a high percentage of Atom sites, so in the end, I sent RSS to syndication and Atom to feedtools and took the corresponding perf hit for Atom feeds.

In this case you could create a wrapper for feed-normalizer that interfaces both syndication and feedtools, and tell feed-normalizer which one to use first. I assume you'll probably encounter more RSS than Atom.

I find this approach to be decently robust, but not very elegant. I am going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas Carlson's SimpleRSS? I am curious about Andy's feed normalizer.

I personally have found Ruby's RSS library to be very good at handling RSS feeds that aren't broken :slight_smile: What that means is the results should be predictable, but the chance of a good parse may be lower.

SimpleRSS on the other hand is uber-liberal, and if the feed resembles anywhere near an RSS or Atom document, you'll probably get a pretty good result back, but there are small errors sometimes.

Bob Aman did an overview of both parsers, somewhere on sporkmonger.com.

Back to performance again; I did some rudimentary benchmarks[1] of both Ruby's RSS as well as SimpleRSS. I think the results of this benchmark really make the point for SimpleRSS being a great 'backup' parser to have when nothing else will parse an ill-formed feed.

And of course, I'm always looking for patches and new parser wrappers for feed-normalizer.

HTH,
Ray

Hope that helps.

Andy

[1] http://blog.andyis.textdriven.com/articles/2006/03/28/parsers-in-the-pool