[ANN] Release of rubric 0.3

(Josef 'Jupp' SCHUGT) #1


I am proud to announce a new release of rubric, the Ruby-only news
feed aggregator.

Version 0.3 is a major rewrite of rubric. Quoting the changelog which
is implemented as a RSS news feed that is available at


The most important change since 0.2.1 is that rubric now uses a
database. The database is implemented using an index file and a
database file.

The index file contains the date a news item was added to the
database and its MD5 hash.

The database file contains the MD5 hash of a news item and the
compressed and base64-encoded (except that line feeds are
replaced by spaces) item.

You perhaps wonder what the MD5 hash good for. A cryptographic
hash like MD5 is a ‘genetic fingerprint’ of the news item. The
likelihood that two news items have identical MD5 hashes but are
different is smaller than the likelihood that a genetic
fingerprint matches by any chance.

This property of MD5 is used by rubric. The database file only
contains the news items that are needed to generate the portal
page (default hold time for this data is 3 days). The index file
on the other hand contains the hashes of the news items to avoid
that data is outdated news items are added again - which is a
known problem of earlier versions of rubric. The default hold
time for the data in the index file is 4 weeks (28 days).

  • New config options Hold_time_items (time in seconds to hold
    news items) and Hold_time_hashes (time in seconds to hold
    their hashes) have been added.

  • Config file options Input_limit and Output_limit are no
    longer supported.

  • Input_limit and Output_limit are no longer supported.

  • Portal page now advertises HTML version used as 4.0

  • HTML now uses description lists in place of tables.

  • Support for CDATA has been added.

  • Support for broken CDATA has been added (because some feeds
    use it without knowing what it really means).

  • This feed now uses HTML and CDATA.

  • rubric now accepts items with missing titles.

  • Parsing of RSS feeds has been improved.

  • File names for local copies of feeds did change to base64
    encoded representations of the feed URLs (exception: ‘=’ is
    transformed into ‘_’ and ‘/’ into ‘-’, line break is removed)
    with an added ‘.xml’ suffix so that every feed has a unique

  • rss_fetcher now comes with a version number.

  • Instead of ‘host’, ‘path’, and ‘port’, you now can use ‘url’.

BTW: rubric is what I had in mind when I recently posted my data
compression questions.

Now then[1],

Josef ‘Jupp’ SCHUGT

[1] Does one really use that? In German it would be ‘nun denn’ and in
Japanese ‘sore de wa’.


http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
Germany 2004: To boldly spy where no GESTAPO / STASI has spied before

(Josef 'Jupp' SCHUGT) #2


Quite some time has passed since the last release of rubric. The
reason is that quite a lot did change.

rubric is a program that creates an HTML portal page from local
copies of RSS feeds that are downloaded by rss_fetcher (included).

Author: Josef ‘Jupp’ Schugt jupp@rubyforge.org
Homepage: http://rubric.rubyforge.org/
Download: http://rubyforge.org/projects/rubric/
RSS-Feed: http://rubric.rubyforge.org/rubric.rubyforge.org.rss
License: GPL 2.0 or later

rubric and rss_fetcher now accept multiple file names so that you can
update several feed collections with just one call. Both programs
also accept a regular expression. Only feeds with urls matching the
regular expression are updated.

rss_fetcher [–help|-h] [–only <regular_expression>]
[-o <regular_expression>] [<file_name> …]

–help, -h
show this help

–only, -o
Only download feeds with URLs matching <regular_expression>

The configuration file <file_name> defaults to $HOME/.rubric/default

rubric [–help|-h] [–only <regular_expression>]
[-o <regular_expression>] [<file_name> …]

–help, -h
show this help

–only, -o
Only import feeds with URLs matching <regular_expression>

The configuration file <file_name> defaults to $HOME/.rubric/default

Config file format has changed from Ruby to YAML. This has mainly
been done to reduce security issues resulting from the config file
being a ruby script but it also makes the config file shorter.

The way of specifying hold times has strongly been improved. You can

n seconds as
    n, n s, n sec, n secs, n second, and n seconds.

n minutes as
    n m, n min, n mins, n minute, and n minutes.

n hours as
    n h, n hour, and n hours.

n weeks as
    n w, n week, and n weeks.

You can still use multiplications so that the following values are

  • 1 week
  • 7 days
  • 7 * 24 hours
  • 7 * 24 * 60 minutes
  • 7 * 24 * 60 * 60 seconds

Addition is also possible and can be done in two ways:

  • 1 week + 2 days + 3 hours + 5 minutes + 7 seconds
  • 1 week, 2 days, 3 hours, 5 minutes, 7 seconds

Note that while the samples only show integers you can use floats as

Updated data in sample config file. Among other changes it now
includes http://www.ruby-lang.org/en/index.rdf.

Rubric now correctly renders feeds like http://www.swr3.de/rdf-feed/
that contain pre-formatted text with line ends that are not encoded
as CR+LF (Unix) but only using one of these characters (Windows, Mac
OS, etc.).

Feeds like http://www.oekosmos.de/article/rssheadlines provide
http://www.oekosmos.de as channel url but an empty channel title. In
such cases rubric now overrides the channel title by the cannel url
without the protocol part - in the given case by www.oekosmos.de.

Rubric now correctly renders the Linux Gazette feed that is available
at http://www.linuxgazette.com/node/feed

Rubric now correctly renders the feed that is available at

Old host, path, and port scheme now has become unsupported, use url.

Josef ‘Jupp’ SCHUGT


http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
Germany 2004: To boldly spy where no GESTAPO / STASI has spied before