[ANN] Metadata 0.3

Quoth Ilmari Heikkinen:
> > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb extract
> > stuff like artist, title, album, track, and whatnot from ogg/flac?
>
> It should at least. If you're having trouble, lemme know
>
Yeah, I'm having some trouble. I have latest metadata (0.2).

[snip]

Any ideas?

Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

> On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:

On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:

Hi Ilmari!

Just wanted to mention that despite the name, wmainfo will parse anything
wrapped in an ASF audio/video container format[0], so, you could use it to
parse wmv movies as well if your user didn't have mplayer installed.

[0] Advanced Systems Format - Wikipedia

Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.

Description
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks, filename
  extensions, bags of bytes and mimetypes.

Usage
-----

  # print out metadata header
  mdh -p myfile.jpg

  # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
  mdh myfile.jpg

  # print out metadata header from mdh file
  mdh -e -p myfile.jpg.mdh

  # strip out metadata header from mdh file and save it to myfile.jpg
  mdh -e myfile.jpg.mdh

Metadata.extract('myfile.jpg')
Metadata.extract_text('myfile.jpg')
Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

  Audio:
    Successfully tested with:
      mp3, flac, ogg, wav
    Should also work:
      wma, m4a

  Video:
    What you manage to make mplayer play, which can be just about anything.
    Then again, missing title and author data, etc. (do videos even have those?)
    Successfully tested with:
      wmv, mov, divx, xvid, flv, ogm, mpg

  Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
      jpeg, png, gif, nef, dng, crw, pef, psd

  Documents:
    Successfully tested with:
      pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
      convert OO docs to temp PDFs for the text & dimensions extraction, so
      those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work too,
      dimensions missing due to the above temp PDF -thing.)

  Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

Requirements
------------

  * Ruby 1.8

  * Tons of metadata extraction programs and libs,
    list of gems:
      flacinfo-rb
      wmainfo-rb
      MP4info
    list of debian packages:
      dcraw
      libimlib2-ruby
      extract
      libimage-exiftool-perl
      poppler-utils
      mplayer
      html2text
      imagemagick
      unhtml
      pstotext
      antiword
      catdoc
      shared-mime-info
      vorbis-tools

  * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    shared-mime-info

  * Python + chardet library
    http://chardet.feedparser.org/

Install
-------

  De-compress archive and enter its top directory.
  Then type:

   ($ su)
    # ruby setup.rb

  These simple step installs this program under the default
  location of Ruby libraries. You can also install files into
  your favorite directory by supplying setup.rb some options.
  Try "ruby setup.rb --help".

License
-------

  Ruby's

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> > > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb

extract

> > > stuff like artist, title, album, track, and whatnot from ogg/flac?
> >
> > It should at least. If you're having trouble, lemme know
> >
> Yeah, I'm having some trouble. I have latest metadata (0.2).
>
> [snip]
>
> Any ideas?

Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata

> Hi Ilmari!
>
> Just wanted to mention that despite the name, wmainfo will parse anything
> wrapped in an ASF audio/video container format[0], so, you could use it to
> parse wmv movies as well if your user didn't have mplayer installed.
>
> [0] Advanced Systems Format - Wikipedia
>

Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.

Description
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks, filename
  extensions, bags of bytes and mimetypes.

Usage
-----

  # print out metadata header
  mdh -p myfile.jpg

  # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
  mdh myfile.jpg

  # print out metadata header from mdh file
  mdh -e -p myfile.jpg.mdh

  # strip out metadata header from mdh file and save it to myfile.jpg
  mdh -e myfile.jpg.mdh

> Metadata.extract('myfile.jpg')
> Metadata.extract_text('myfile.jpg')
> Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

  Audio:
    Successfully tested with:
      mp3, flac, ogg, wav
    Should also work:
      wma, m4a

  Video:
    What you manage to make mplayer play, which can be just about anything.
    Then again, missing title and author data, etc. (do videos even have

those?)

    Successfully tested with:
      wmv, mov, divx, xvid, flv, ogm, mpg

  Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
      jpeg, png, gif, nef, dng, crw, pef, psd

  Documents:
    Successfully tested with:
      pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
      convert OO docs to temp PDFs for the text & dimensions extraction, so
      those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work

too,

      dimensions missing due to the above temp PDF -thing.)

  Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

Requirements
------------

  * Ruby 1.8

  * Tons of metadata extraction programs and libs,
    list of gems:
      flacinfo-rb
      wmainfo-rb
      MP4info
    list of debian packages:
      dcraw
      libimlib2-ruby
      extract
      libimage-exiftool-perl
      poppler-utils
      mplayer
      html2text
      imagemagick
      unhtml
      pstotext
      antiword
      catdoc
      shared-mime-info
      vorbis-tools

  * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    http://freedesktop.org/wiki/Software/shared-mime-info

  * Python + chardet library
    http://chardet.feedparser.org/

Install
-------

  De-compress archive and enter its top directory.
  Then type:

   ($ su)
    # ruby setup.rb

  These simple step installs this program under the default
  location of Ruby libraries. You can also install files into
  your favorite directory by supplying setup.rb some options.
  Try "ruby setup.rb --help".

License
-------

  Ruby's

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>
http://fhtr.blogspot.com

Any chance you could wrap this up as a gem? It's not something I care
strongly about, and I don't know how complicated the process is, but I think
it would help ease installation for some users.

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:
On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> > > Hmm, am I not seeing it (just using 'mdh -p') or can metadata.rb

extract

> > > stuff like artist, title, album, track, and whatnot from ogg/flac?
> >
> > It should at least. If you're having trouble, lemme know
> >
> Yeah, I'm having some trouble. I have latest metadata (0.2).
>
> [snip]
>
> Any ideas?

Yeah, I failed at using git. Jeez. Sorry about that.
Here's 0.3, it oughta work:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.3.tar.gz
git: http://dark.fhtr.org/repos/metadata

> Hi Ilmari!
>
> Just wanted to mention that despite the name, wmainfo will parse anything
> wrapped in an ASF audio/video container format[0], so, you could use it to
> parse wmv movies as well if your user didn't have mplayer installed.
>
> [0] Advanced Systems Format - Wikipedia
>

Thanks for the pointer!
I made it merge the wmainfo output to the mplayer output for wmv and asf.

Description
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks, filename
  extensions, bags of bytes and mimetypes.

Usage
-----

  # print out metadata header
  mdh -p myfile.jpg

  # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
  mdh myfile.jpg

  # print out metadata header from mdh file
  mdh -e -p myfile.jpg.mdh

  # strip out metadata header from mdh file and save it to myfile.jpg
  mdh -e myfile.jpg.mdh

> Metadata.extract('myfile.jpg')
> Metadata.extract_text('myfile.jpg')
> Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

  Audio:
    Successfully tested with:
      mp3, flac, ogg, wav
    Should also work:
      wma, m4a

  Video:
    What you manage to make mplayer play, which can be just about anything.
    Then again, missing title and author data, etc. (do videos even have

those?)

    Successfully tested with:
      wmv, mov, divx, xvid, flv, ogm, mpg

  Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
      jpeg, png, gif, nef, dng, crw, pef, psd

  Documents:
    Successfully tested with:
      pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
      convert OO docs to temp PDFs for the text & dimensions extraction, so
      those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work

too,

      dimensions missing due to the above temp PDF -thing.)

  Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

Requirements
------------

  * Ruby 1.8

  * Tons of metadata extraction programs and libs,
    list of gems:
      flacinfo-rb
      wmainfo-rb
      MP4info
    list of debian packages:
      dcraw
      libimlib2-ruby
      extract
      libimage-exiftool-perl
      poppler-utils
      mplayer
      html2text
      imagemagick
      unhtml
      pstotext
      antiword
      catdoc
      shared-mime-info
      vorbis-tools

  * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    http://freedesktop.org/wiki/Software/shared-mime-info

  * Python + chardet library
    http://chardet.feedparser.org/

Install
-------

  De-compress archive and enter its top directory.
  Then type:

   ($ su)
    # ruby setup.rb

  These simple step installs this program under the default
  location of Ruby libraries. You can also install files into
  your favorite directory by supplying setup.rb some options.
  Try "ruby setup.rb --help".

License
-------

  Ruby's

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>
http://fhtr.blogspot.com

Er, I'm still not getting information out of ogg files:

  $ mdh -p ~/music/bowling_for_soup_-_1985.ogg

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > On 9/14/07, Konrad Meyer <konrad@tylerc.org> wrote:
On 9/15/07, darren kirby <bulliver@badcomputer.org> wrote:

  ---
  Video.Duration: 192.78
  Audio.Samplerate: 44100
  Audio.Bitrate: 192.0
  Image.DimensionUnit: px
  Video.Codec: ""
  File.Size: 4618665
  Audio.Codec: vrbs
  File.Modified: 2007-01-03T22:10:11-08:00
  File.Format: video/x-theora+ogg

  $ mplayer ~/music/bowling_for_soup_-_1985.ogg
  ...
  Clip info:
   Genre: Pop
   Name: 1985
   Artist: Bowling for Soup
   Creation Date: 2004
   Album: A Hangover You Don't Deserve
   Track: 03

Thanks for your quick responses!

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Er, I'm still not getting information out of ogg files:

  $ mdh -p ~/music/bowling_for_soup_-_1985.ogg
  ---
  Video.Duration: 192.78
  Audio.Samplerate: 44100
  Audio.Bitrate: 192.0
  Image.DimensionUnit: px
  Video.Codec: ""
  File.Size: 4618665
  Audio.Codec: vrbs
  File.Modified: 2007-01-03T22:10:11-08:00

  File.Format: video/x-theora+ogg

^- That's the problem there. It thinks it's a video file.

<technical blather>
Why? Probably because I hacked the mimetype guesser to _not_ assume
things based on the filename extension, and the shared-mime-info db
assumes that the guesser _is_ assuming things based on the filename
extension.

Which is something I'd rather not do with downloaded files (which, by
their very nature, have wild disparities between the extension and the
real mimetype.) And the header content-type is often totally wrong or
doesn't match shared-mime-info's naming (e.g.
application/octet-stream vs. image/gif, audio/x-mp3 vs. audio/mpeg,
video/divx vs. video/x-msvideo, video/x-ms-asf vs. video/vnd.ms-asf...)

And this magic-over-extension sometimes leads to me getting generic
lesser-magic guesses instead of more specific filename extension
guesses (e.g. zip instead of OO document.) So, I have a list of
generic formats that defer to the extension rather than rely on
the lesser-magic.

Anyhow, it's ugly, hacky magic.
Just like the rest of mimetype guessing.
</technical blather>

But! Fixing this instance of the problem in the next thirty seconds.
... There!

And now, adding ogginfo metadata to video/x-theora+ogg.

Ok, try this:

http://dark.fhtr.org/repos/metadata/metadata-0.4.tar.gz

Thanks for your quick responses!

Thanks for the bug reports! They really help in making this thing
more robust.

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

--
Ilmari Heikkinen

Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those :slight_smile:

0.5 here we come!

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

  $ mplayer ~/music/bowling_for_soup_-_1985.ogg
  ...
  Clip info:
   Genre: Pop
   Name: 1985
   Artist: Bowling for Soup
   Creation Date: 2004
   Album: A Hangover You Don't Deserve
   Track: 03

Quoth Ilmari Heikkinen:

> $ mplayer ~/music/bowling_for_soup_-_1985.ogg
> ...
> Clip info:
> Genre: Pop
> Name: 1985
> Artist: Bowling for Soup
> Creation Date: 2004
> Album: A Hangover You Don't Deserve
> Track: 03

Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those :slight_smile:

0.5 here we come!

Another bug (Sorry :D):
  $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
  sh: -c: line 0: syntax error near unexpected token `('
  sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
    (edited).ogg''

(Last line was broken up to email length.) You're already escaping single
quotes for the shell, need to escape start-parens and end-parens as well.

Thanks,

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

···

On 9/15/07, Konrad Meyer <konrad@tylerc.org> wrote:

> $ mplayer ~/music/bowling_for_soup_-_1985.ogg
> ...
> Clip info:
> Genre: Pop
> Name: 1985
> Artist: Bowling for Soup
> Creation Date: 2004
> Album: A Hangover You Don't Deserve
> Track: 03

Oh, nice, mplayer does give out metadata fields. I better augment
the mplayer info parser to grab those :slight_smile:

0.5 here we come!

Also:
For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
at the front of all the string fields.

  $ mdh -p ~/music/Snoop\ Dogg\ -\ Gin\ \&\ Juice.mp3
  ---
  Audio.Album: "\xCB\x99\xC5\xA3Death Row's Snoop Doggy Dogg Greatest Hits
    (2001)"
  ...
  Audio.Genre: "\xCB\x99\xC5\xA3Hip-Hop"
  Audio.Title: "\xCB\x99\xC5\xA3Gin & Juice"
  ...
  Audio.Artist: "\xCB\x99\xC5\xA3Snoop Dogg"

I *think* this is an id3v2 thing. Also, it happens in more than one file and
amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
end. Thanks!
--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/