Hi!
I am proud to announce a new release of rubric, the Ruby-only news
feed aggregator.
Version 0.3 is a major rewrite of rubric. Quoting the changelog which
is implemented as a RSS news feed that is available at
http://rubric.rubyforge.org/rubric.rubyforge.org.rss
The most important change since 0.2.1 is that rubric now uses a
database. The database is implemented using an index file and a
database file.The index file contains the date a news item was added to the
database and its MD5 hash.The database file contains the MD5 hash of a news item and the
compressed and base64-encoded (except that line feeds are
replaced by spaces) item.You perhaps wonder what the MD5 hash good for. A cryptographic
hash like MD5 is a ‘genetic fingerprint’ of the news item. The
likelihood that two news items have identical MD5 hashes but are
different is smaller than the likelihood that a genetic
fingerprint matches by any chance.This property of MD5 is used by rubric. The database file only
contains the news items that are needed to generate the portal
page (default hold time for this data is 3 days). The index file
on the other hand contains the hashes of the news items to avoid
that data is outdated news items are added again - which is a
known problem of earlier versions of rubric. The default hold
time for the data in the index file is 4 weeks (28 days).
New config options Hold_time_items (time in seconds to hold
news items) and Hold_time_hashes (time in seconds to hold
their hashes) have been added.Config file options Input_limit and Output_limit are no
longer supported.Input_limit and Output_limit are no longer supported.
Portal page now advertises HTML version used as 4.0
transitional.HTML now uses description lists in place of tables.
Support for CDATA has been added.
Support for broken CDATA has been added (because some feeds
use it without knowing what it really means).This feed now uses HTML and CDATA.
rubric now accepts items with missing titles.
Parsing of RSS feeds has been improved.
File names for local copies of feeds did change to base64
encoded representations of the feed URLs (exception: ‘=’ is
transformed into ‘_’ and ‘/’ into ‘-’, line break is removed)
with an added ‘.xml’ suffix so that every feed has a unique
filename.rss_fetcher now comes with a version number.
Instead of ‘host’, ‘path’, and ‘port’, you now can use ‘url’.
BTW: rubric is what I had in mind when I recently posted my data
compression questions.
Now then[1],
Josef ‘Jupp’ SCHUGT
[1] Does one really use that? In German it would be ‘nun denn’ and in
Japanese ‘sore de wa’.
···
–
http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Germany 2004: To boldly spy where no GESTAPO / STASI has spied before