[ANN] Metadata 0.5

Any chance you could wrap this up as a gem?

I already have a gemspec file, but gem screws up bin/chardet by
plastering it with #!/usr/bin/ruby boilerplate (it's a python file).

And I don't know how to turn it off.

Another bug (Sorry :D):
  $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
  sh: -c: line 0: syntax error near unexpected token `('
  sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
    (edited).ogg''

(Last line was broken up to email length.) You're already escaping single
quotes for the shell, need to escape start-parens and end-parens as well.

Argh, amateurish mistake on my part, thanks for catching that. Fixed.
If in a bit over-engineered way (creating a safely named link to the file.)
Probably impossible to safely pass a filename like "-f -i -l -e -z"
to a shell command that doesn't support "--" in any other way, though.

Also:
For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
at the front of all the string fields.

[snip]

I *think* this is an id3v2 thing. Also, it happens in more than one file and
amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
end. Thanks!

Right you are. Fixed. No idea what was causing it. Moved to
using id3lib for the tags (it extracts embedded album art as well!) and
mplayer for the rest of the metadata.

Here we go, 0.5:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
git: http://dark.fhtr.org/repos/metadata

Description

···

Konrad Meyer <konrad@tylerc.org> wrote:
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks, filename
  extensions, bags of bytes and mimetypes.

Usage
-----

  # print out metadata header
  mdh -p myfile.jpg

  # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
  mdh myfile.jpg

  # print out metadata header from mdh file
  mdh -e -p myfile.jpg.mdh

  # strip out metadata header from mdh file and save it to myfile.jpg
  mdh -e myfile.jpg.mdh

  # print out list of flags
  mdh -h

Metadata.extract('myfile.jpg')
Metadata.extract_text('myfile.jpg')
Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

  Audio:
    Whatever you manage to make mplayer play.
    Plus FLAC, m4a and wma handled specially.
    Successfully tested with:
      mp3, flac, ogg, wav
    Should also work:
      wma, m4a

  Video:
    Whatever you manage to make mplayer play.
    Successfully tested with:
      wmv, mov, divx, xvid, flv, ogm, mpg

  Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
      jpeg, png, gif, nef, dng, crw, pef, psd

  Documents:
    Successfully tested with:
      pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
      convert OO docs to temp PDFs for the text & dimensions extraction, so
      those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work too,
      dimensions missing due to the above temp PDF -thing.)

  Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

Requirements
------------

  * Ruby 1.8

  * Tons of metadata extraction programs and libs,
    list of gems:
      flacinfo-rb
      wmainfo-rb
      MP4info
      id3lib-ruby
    list of debian packages:
      dcraw
      libimlib2-ruby
      extract
      libimage-exiftool-perl
      poppler-utils
      mplayer
      html2text
      imagemagick
      unhtml
      pstotext
      antiword
      catdoc
      shared-mime-info

  * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    http://freedesktop.org/wiki/Software/shared-mime-info

  * Python + chardet library
    http://chardet.feedparser.org/

Install
-------

  De-compress archive and enter its top directory.
  Then type:

   ($ su)
    # ruby setup.rb

  These simple step installs this program under the default
  location of Ruby libraries. You can also install files into
  your favorite directory by supplying setup.rb some options.
  Try "ruby setup.rb --help".

License
-------

  Ruby's

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>

Quoth Ilmari Heikkinen:

> Any chance you could wrap this up as a gem?

I already have a gemspec file, but gem screws up bin/chardet by
plastering it with #!/usr/bin/ruby boilerplate (it's a python file).

And I don't know how to turn it off.

> Another bug (Sorry :D):
> $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
> sh: -c: line 0: syntax error near unexpected token `('
> sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
> (edited).ogg''
>
> (Last line was broken up to email length.) You're already escaping single
> quotes for the shell, need to escape start-parens and end-parens as well.

Argh, amateurish mistake on my part, thanks for catching that. Fixed.
If in a bit over-engineered way (creating a safely named link to the file.)
Probably impossible to safely pass a filename like "-f -i -l -e -z"
to a shell command that doesn't support "--" in any other way, though.

> Also:
> For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
> at the front of all the string fields.
>
> [snip]
>
> I *think* this is an id3v2 thing. Also, it happens in more than one file

and

> amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
> end. Thanks!

Right you are. Fixed. No idea what was causing it. Moved to
using id3lib for the tags (it extracts embedded album art as well!) and
mplayer for the rest of the metadata.

Here we go, 0.5:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
git: http://dark.fhtr.org/repos/metadata

...

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>
http://fhtr.blogspot.com

Thanks, trying it out now. (I'm basically running it on every file
in my collection and running back to you when I get errors. :D)

···

Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

> Any chance you could wrap this up as a gem?

I already have a gemspec file, but gem screws up bin/chardet by
plastering it with #!/usr/bin/ruby boilerplate (it's a python file).

And I don't know how to turn it off.

> Another bug (Sorry :D):
> $ mdh -p ~/music/Limp\ Bizkit\ -\ Rollin\'\ \(edited\).ogg
> sh: -c: line 0: syntax error near unexpected token `('
> sh: -c: line 0: `ogginfo '/home/konrad/music/Limp Bizkit - Rollin\'
> (edited).ogg''
>
> (Last line was broken up to email length.) You're already escaping single
> quotes for the shell, need to escape start-parens and end-parens as well.

Argh, amateurish mistake on my part, thanks for catching that. Fixed.
If in a bit over-engineered way (creating a safely named link to the file.)
Probably impossible to safely pass a filename like "-f -i -l -e -z"
to a shell command that doesn't support "--" in any other way, though.

> Also:
> For mp3 id3v2 tags, the binary string "\xCB\x99\xC5\xA3" is being inserted
> at the front of all the string fields.
>
> [snip]
>
> I *think* this is an id3v2 thing. Also, it happens in more than one file

and

> amaroK sees the tags "correctly", so I'm thinking it's on the metadata's
> end. Thanks!

Right you are. Fixed. No idea what was causing it. Moved to
using id3lib for the tags (it extracts embedded album art as well!) and
mplayer for the rest of the metadata.

Here we go, 0.5:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
git: http://dark.fhtr.org/repos/metadata

Description
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks, filename
  extensions, bags of bytes and mimetypes.

Usage
-----

  # print out metadata header
  mdh -p myfile.jpg

  # create myfile.jpg.mdh, which consists of metadata header + myfile.jpg
  mdh myfile.jpg

  # print out metadata header from mdh file
  mdh -e -p myfile.jpg.mdh

  # strip out metadata header from mdh file and save it to myfile.jpg
  mdh -e myfile.jpg.mdh

  # print out list of flags
  mdh -h

> Metadata.extract('myfile.jpg')
> Metadata.extract_text('myfile.jpg')
> Pathname.new("myfile.jpg").metadata

List of supported formats
-------------------------

  Audio:
    Whatever you manage to make mplayer play.
    Plus FLAC, m4a and wma handled specially.
    Successfully tested with:
      mp3, flac, ogg, wav
    Should also work:
      wma, m4a

  Video:
    Whatever you manage to make mplayer play.
    Successfully tested with:
      wmv, mov, divx, xvid, flv, ogm, mpg

  Images:
    Should handle pretty much anything (apart from XCF and ORF.)
    Successfully tested with:
      jpeg, png, gif, nef, dng, crw, pef, psd

  Documents:
    Successfully tested with:
      pdf, ppt, odp, sxi, ps, ps.gz, html, txt
    Should work:
    - OpenOffice docs work to some degree (personally, I'm using unoconv to
      convert OO docs to temp PDFs for the text & dimensions extraction, so
      those bits of data are missing.)
    - MS Office docs to some degree (ppt at least, doc and xls should work

too,

      dimensions missing due to the above temp PDF -thing.)

  Others:
    Whatever extract spits out on the five or six bits of metadata I'm using
    from it. Archive contents at least.

Requirements
------------

  * Ruby 1.8

  * Tons of metadata extraction programs and libs,
    list of gems:
      flacinfo-rb
      wmainfo-rb
      MP4info
      id3lib-ruby
    list of debian packages:
      dcraw
      libimlib2-ruby
      extract
      libimage-exiftool-perl
      poppler-utils
      mplayer
      html2text
      imagemagick
      unhtml
      pstotext
      antiword
      catdoc
      shared-mime-info

  * You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    http://freedesktop.org/wiki/Software/shared-mime-info

  * Python + chardet library
    http://chardet.feedparser.org/

Install
-------

  De-compress archive and enter its top directory.
  Then type:

   ($ su)
    # ruby setup.rb

  These simple step installs this program under the default
  location of Ruby libraries. You can also install files into
  your favorite directory by supplying setup.rb some options.
  Try "ruby setup.rb --help".

License
-------

  Ruby's

--
Ilmari Heikkinen <ilmari.heikkinen gmail com>
http://fhtr.blogspot.com

Another bug, here we go:

  undefined method `audio_x_vorbis_ogg' for Metadata:Module
  undefined method `audio_x_vorbis_ogg' for Metadata:Module
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:365:in `video_x_theora_ogg'
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `__send__'
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `extract'

That code that seems to be failing is:

  def video_x_theora_ogg(filename, charset)
    h = video(filename, charset)
    wma = audio_x_vorbis_ogg(filename, charset)
    %w(
      Artist Title Album Genre ReleaseDate TrackNo VariableBitrate
    ).each{|t|
      h['Video.'+t] = wma['Audio.'+t]
    }
    h
  end

This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :smiley:

···

Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Ilmari Heikkinen wrote:

tarball: http://dark.fhtr.org/repos/metadata/metadata-0.5.tar.gz
git: http://dark.fhtr.org/repos/metadata

These links don't work. Is there somewhere else I can find this project?
Is it available as a gem?

Description
-----------

  This package `Metadata' comes with a library called `metadata' and
  a small program called `mdh'.

  The library probes files for their metadata (e.g. jpeg dimensions
  and camera make, mp3 artist, pdf word count) and returns the metadata
  as a Hash.

  Mdh can print out file metadata as YAML and package the metadata
  with the file.

  This package has many dependencies since there is no single universal
  metadata header format that all files use. Blame resource forks,
filename
  extensions, bags of bytes and mimetypes.

Thanks.

···

--
Posted via http://www.ruby-forum.com/.

Fixed. And 0.6 :slight_smile:
http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz

···

On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:

Another bug, here we go:

  undefined method `audio_x_vorbis_ogg' for Metadata:Module

This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :smiley:

Quoth Ilmari Heikkinen:

> Another bug, here we go:

> undefined method `audio_x_vorbis_ogg' for Metadata:Module

> This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :smiley:

Fixed. And 0.6 :slight_smile:
http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz

Ooh, here's another: :slight_smile:

/usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such file
or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
(Errno::ENOENT)
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
`secure_filename'
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
`extract_extract_info'
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in `extract'

Not sure what that is, and frankly atm my brain is a bit too weak to think
about it. But you should be fresh and able to solve that.

···

On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Apparently temporary hardlinks weren't such a hot idea after all. Nuts.

Ok, now escaping filename by default, only trying to "ln rescue cp" for
filenames starting with a dash. Running it against my downloads-dir
presently, been working ok thus far. YMMV of course :slight_smile:

http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz

···

On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:

Quoth Ilmari Heikkinen:
> On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:
>
> > Another bug, here we go:
>
> > undefined method `audio_x_vorbis_ogg' for Metadata:Module
>
> > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere else. :smiley:
>
> Fixed. And 0.6 :slight_smile:
> http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz

Ooh, here's another: :slight_smile:

/usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such file
or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
(Errno::ENOENT)
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
`secure_filename'
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
`extract_extract_info'
        from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in `extract'

Not sure what that is, and frankly atm my brain is a bit too weak to think
about it. But you should be fresh and able to solve that.

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> >
> > > Another bug, here we go:
> >
> > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
> >
> > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere

else. :smiley:

> >
> > Fixed. And 0.6 :slight_smile:
> > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz
>
> Ooh, here's another: :slight_smile:
>
> /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such

file

> or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
> (Errno::ENOENT)
> from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
> `secure_filename'
> from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
> `extract_extract_info'
> from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in

`extract'

>
> Not sure what that is, and frankly atm my brain is a bit too weak to think
> about it. But you should be fresh and able to solve that.

Apparently temporary hardlinks weren't such a hot idea after all. Nuts.

Ok, now escaping filename by default, only trying to "ln rescue cp" for
filenames starting with a dash. Running it against my downloads-dir
presently, been working ok thus far. YMMV of course :slight_smile:

http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz

The title tag isn't being parsed out of oggs:

  $ mdh -p music/korn_-_clown.ogg
  Video.TrackNo: 16
  Video.Artist: Korn
  Video.Genre: Hard Rock
  Video.Album: Greatest Hits Vol. 1

vs mplayer:

  Ogg file format detected.
  Clip info:
   Genre: Hard Rock
   Name: Clown
   Artist: Korn
   Album: Greatest Hits Vol. 1
   Track: 16

Cheers,

···

On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Ah, it uses Name instead of Title. Thanks!
Added it and made 0.8.

http://dark.fhtr.org/repos/metadata/metadata-0.8.tar.gz

Now I wonder what other synonyms mplayer uses...
I'd really appreciate it if you could run the following over
your media library and tell what field names it spews out:

find $MEDIA_LIBRARY_DIR -type f | \
mplayer -identify -ao null -vo null -frames 0 -playlist - | \
grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq

(replace $MEDIA_LIBRARY_DIR with the directory name)

Thanks again,

···

On 9/17/07, Konrad Meyer <konrad@tylerc.org> wrote:

Quoth Ilmari Heikkinen:
> On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > Quoth Ilmari Heikkinen:
> > > On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > >
> > > > Another bug, here we go:
> > >
> > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
> > >
> > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere
else. :smiley:
> > >
> > > Fixed. And 0.6 :slight_smile:
> > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz
> >
> > Ooh, here's another: :slight_smile:
> >
> > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No such
file
> > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
> > (Errno::ENOENT)
> > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
> > `secure_filename'
> > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
> > `extract_extract_info'
> > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in
`extract'
> >
> > Not sure what that is, and frankly atm my brain is a bit too weak to think
> > about it. But you should be fresh and able to solve that.
>
> Apparently temporary hardlinks weren't such a hot idea after all. Nuts.
>
> Ok, now escaping filename by default, only trying to "ln rescue cp" for
> filenames starting with a dash. Running it against my downloads-dir
> presently, been working ok thus far. YMMV of course :slight_smile:
>
> http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz

The title tag isn't being parsed out of oggs:

  $ mdh -p music/korn_-_clown.ogg
  Video.TrackNo: 16
  Video.Artist: Korn
  Video.Genre: Hard Rock
  Video.Album: Greatest Hits Vol. 1

vs mplayer:

  Ogg file format detected.
  Clip info:
   Genre: Hard Rock
   Name: Clown
   Artist: Korn
   Album: Greatest Hits Vol. 1
   Track: 16

--
Ilmari Heikkinen

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> > > Quoth Ilmari Heikkinen:
> > > >
> > > > > Another bug, here we go:
> > > >
> > > > > undefined method `audio_x_vorbis_ogg' for Metadata:Module
> > > >
> > > > > This makes sense, as audio_x_vorbis_ogg() doesn't exist anywhere
> else. :smiley:
> > > >
> > > > Fixed. And 0.6 :slight_smile:
> > > > http://dark.fhtr.org/repos/metadata/metadata-0.6.tar.gz
> > >
> > > Ooh, here's another: :slight_smile:
> > >
> > > /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in `unlink': No

such

> file
> > > or directory - _tmp_metadata_temp_22720__604265598_1189946590.27022
> > > (Errno::ENOENT)
> > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:735:in
> > > `secure_filename'
> > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:590:in
> > > `extract_extract_info'
> > > from /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:153:in
> `extract'
> > >
> > > Not sure what that is, and frankly atm my brain is a bit too weak to

think

> > > about it. But you should be fresh and able to solve that.
> >
> > Apparently temporary hardlinks weren't such a hot idea after all. Nuts.
> >
> > Ok, now escaping filename by default, only trying to "ln rescue cp" for
> > filenames starting with a dash. Running it against my downloads-dir
> > presently, been working ok thus far. YMMV of course :slight_smile:
> >
> > http://dark.fhtr.org/repos/metadata/metadata-0.7.tar.gz
>
> The title tag isn't being parsed out of oggs:
>
> $ mdh -p music/korn_-_clown.ogg
> Video.TrackNo: 16
> Video.Artist: Korn
> Video.Genre: Hard Rock
> Video.Album: Greatest Hits Vol. 1
>
> vs mplayer:
>
> Ogg file format detected.
> Clip info:
> Genre: Hard Rock
> Name: Clown
> Artist: Korn
> Album: Greatest Hits Vol. 1
> Track: 16
>

Ah, it uses Name instead of Title. Thanks!
Added it and made 0.8.

http://dark.fhtr.org/repos/metadata/metadata-0.8.tar.gz

Now I wonder what other synonyms mplayer uses...
I'd really appreciate it if you could run the following over
your media library and tell what field names it spews out:

find $MEDIA_LIBRARY_DIR -type f | \
mplayer -identify -ao null -vo null -frames 0 -playlist - | \
grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq

(replace $MEDIA_LIBRARY_DIR with the directory name)

Thanks again,
--
Ilmari Heikkinen
http://fhtr.blogspot.com

I'd love to run that but mplayer dies rather early on on some of my files.
Also, seems like we have another bug (not sure what kind of file it's on,
sorry):

  undefined method `empty?' for 40:Fixnum
  undefined method `empty?' for 40:Fixnum
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:765:in `enc_utf8'
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:352:in `video'
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `__send__'
  /usr/lib/ruby/site_ruby/1.8/metadata/extract.rb:142:in `extract'

I'd guess one of the libraries you're using for parsing is giving back 40
as a genre or track number (a bit high, but might be tagged wrong) and it
needs to be converted to a string before you can use it.

Thanks!

···

On 9/17/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > > > On 9/16/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:
> find $MEDIA_LIBRARY_DIR -type f | \
> mplayer -identify -ao null -vo null -frames 0 -playlist - | \
> grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq
>
> (replace $MEDIA_LIBRARY_DIR with the directory name)
>

I'd love to run that but mplayer dies rather early on on some of my files.

Hmm, here's a ruby version that should work through those:

media_dir = "music"
seen_names = {}
mpc = "mplayer -identify -ao null -vo null -frames 0 -playlist - 2>/dev/null"
Dir["#{media_dir}/**/*"].each{|fn|
  if File.file?(fn)
    IO.popen(mpc, "r+"){|mp|
      begin
        mp.puts fn
        mp.close_write
        tags = mp.read.strip.split("\n").grep(/^ID_CLIP_INFO_NAME/)
        names = tags.map{|t| t.split("=", 2)[1] }
        names.each{|n|
          seen_names[n] ||= (puts n; true)
        }
      rescue
      end
    }
  end
}

I'd guess one of the libraries you're using for parsing is giving back 40
as a genre or track number (a bit high, but might be tagged wrong) and it
needs to be converted to a string before you can use it.

Good catch, thanks. Fixed.
http://dark.fhtr.org/repos/metadata/metadata-0.9.tar.gz

···

On 9/17/07, Konrad Meyer <konrad@tylerc.org> wrote:

Thanks!
--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> > find $MEDIA_LIBRARY_DIR -type f | \
> > mplayer -identify -ao null -vo null -frames 0 -playlist - | \
> > grep ID_CLIP_INFO_NAME | sed 's/^.*=//' | sort | uniq
> >
> > (replace $MEDIA_LIBRARY_DIR with the directory name)
> >

> I'd love to run that but mplayer dies rather early on on some of my files.

Hmm, here's a ruby version that should work through those:

media_dir = "music"
seen_names = {}
mpc = "mplayer -identify -ao null -vo null -frames 0 -playlist -

2>/dev/null"

Dir["#{media_dir}/**/*"].each{|fn|
  if File.file?(fn)
    IO.popen(mpc, "r+"){|mp|
      begin
        mp.puts fn
        mp.close_write
        tags = mp.read.strip.split("\n").grep(/^ID_CLIP_INFO_NAME/)
        names = tags.map{|t| t.split("=", 2)[1] }
        names.each{|n|
          seen_names[n] ||= (puts n; true)
        }
      rescue
      end
    }
  end
}

> I'd guess one of the libraries you're using for parsing is giving back 40
> as a genre or track number (a bit high, but might be tagged wrong) and it
> needs to be converted to a string before you can use it.

Good catch, thanks. Fixed.
http://dark.fhtr.org/repos/metadata/metadata-0.9.tar.gz

>
> Thanks!
> --
> Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Just FYI -- I've been running your script since about 6.5 hours ago, I'll
reply again when it's actually done. So far no problems.

HTH,

···

On 9/17/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Big library, eh :slight_smile:
If it's still running, I doubt it's going to do much more...

What has it printed out?

Thanks!

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:

Just FYI -- I've been running your script since about 6.5 hours ago, I'll
reply again when it's actually done. So far no problems.

--
Ilmari Heikkinen

Quoth Ilmari Heikkinen:

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:
> Just FYI -- I've been running your script since about 6.5 hours ago, I'll
> reply again when it's actually done. So far no problems.

Big library, eh :slight_smile:
If it's still running, I doubt it's going to do much more...

What has it printed out?

Thanks!
--
Ilmari Heikkinen
http://fhtr.blogspot.com

Well, I'm having it run over all of them, then print the output. But
sometimes mplayer just starts eating 100% cpu and doesn't exit, so the
script gets stuck on the one song.

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

> Just FYI -- I've been running your script since about 6.5 hours ago, I'll
> reply again when it's actually done. So far no problems.

Big library, eh :slight_smile:
If it's still running, I doubt it's going to do much more...

What has it printed out?

Thanks!
--
Ilmari Heikkinen
http://fhtr.blogspot.com

As a matter of fact (I only had to kill mplayer 5-7 times):
  Genre
  Name
  Artist
  Creation Date
  Album
  Track
  Title
  Year
  Comment
  name
  author
  Comments

HTH,

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Quoth Ilmari Heikkinen:

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:
> Just FYI -- I've been running your script since about 6.5 hours ago, I'll
> reply again when it's actually done. So far no problems.

Big library, eh :slight_smile:
If it's still running, I doubt it's going to do much more...

What has it printed out?

Thanks!
--
Ilmari Heikkinen
http://fhtr.blogspot.com

Ok, now it successfully runs over my entire collection. Yay! Now it's time
to fix all those tag-less songs.

Thanks much,
--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Alright, thanks a lot! I was missing 'name' and 'author', the rest I
had already.

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:

Quoth Ilmari Heikkinen:
> On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > Just FYI -- I've been running your script since about 6.5 hours ago, I'll
> > reply again when it's actually done. So far no problems.
>
> Big library, eh :slight_smile:
> If it's still running, I doubt it's going to do much more...
>
> What has it printed out?
>
> Thanks!
> --
> Ilmari Heikkinen
> http://fhtr.blogspot.com

As a matter of fact (I only had to kill mplayer 5-7 times):
  Genre
  Name
  Artist
  Creation Date
  Album
  Track
  Title
  Year
  Comment
  name
  author
  Comments

Quoth Ilmari Heikkinen:

> Quoth Ilmari Heikkinen:
> > > Just FYI -- I've been running your script since about 6.5 hours ago,

I'll

> > > reply again when it's actually done. So far no problems.
> >
> > Big library, eh :slight_smile:
> > If it's still running, I doubt it's going to do much more...
> >
> > What has it printed out?
> >
> > Thanks!
> > --
> > Ilmari Heikkinen
> > http://fhtr.blogspot.com
>
> As a matter of fact (I only had to kill mplayer 5-7 times):
> Genre
> Name
> Artist
> Creation Date
> Album
> Track
> Title
> Year
> Comment
> name
> author
> Comments
>

Alright, thanks a lot! I was missing 'name' and 'author', the rest I
had already.

Alright, glad to help.

···

On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:
> > On 9/18/07, Konrad Meyer <konrad@tylerc.org> wrote:

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/