Compression (besides Huffman) and Ruby

Hi!

I don’t want to start a lenghty thread about wether that makes sense
or not. My question is only if such a thing is known to exist. I did
find Huffman but besides that rei, nada, rien, zero, nichts. Anything
else (note that the point is plain Ruby, no extension). Otherwise I
will try to implement such an algorithm on my own.

Josef ‘Jupp’ SCHUGT

···


http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Germany 2004: To boldly spy where no GESTAPO / STASI has spied before

Well, here’s a really bad RLE encoder/decoder. :wink:

#!/usr/bin/ruby

$last=nil
$count=0

def compress
while gets
$_.each_byte { |c|
if c == $last
$count += 1
else
puts “#{$count}:#{$last}”;
$last = c
$count = 1
end
}
end
end

def decompress
while gets
~ /(\d+):(\d+)/
$1.to_i.times { s = ’ '; s[0] = $2.to_i; print s }
end
end

if ARGV.shift == ‘-d’
decompress
else
compress
end

Usually I just do:

require ‘zlib’

Since it comes with ruby. =)

···

On Tuesday 06 January 2004 6:06 pm, Josef ‘Jupp’ SCHUGT wrote:

Hi!

I don’t want to start a lenghty thread about wether that makes sense
or not. My question is only if such a thing is known to exist. I did
find Huffman but besides that rei, nada, rien, zero, nichts. Anything
else (note that the point is plain Ruby, no extension). Otherwise I
will try to implement such an algorithm on my own.


Wesley J. Landaker - wjl@icecavern.net
OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2

Josef ‘Jupp’ SCHUGT wrote:

Hi!

I don’t want to start a lenghty thread about wether that makes sense
or not. My question is only if such a thing is known to exist. I did
find Huffman but besides that rei, nada, rien, zero, nichts. Anything
else (note that the point is plain Ruby, no extension). Otherwise I
will try to implement such an algorithm on my own.

Josef ‘Jupp’ SCHUGT

I’ve played with the idea of implementing zlib in pure Ruby…
unfortunately, I’ve never got beyond that. I’ve got about 30% of a
working tar implementation, but that’s not really compression. I’d be
very interested in a pure ruby zlib or bzip2 implementation, if you
decide to write one. :wink:

  • Jamis
···


Jamis Buck
jgb3@email.byu.edu

ruby -h | ruby -e ‘a=;readlines.join.scan(/-(.)[e|Kk(\S*)|le.l(…)e|#!(\S*)/) {|r| a << r.compact.first };puts “\n>#{a.join(%q/ /)}<\n\n”’

On Linux, at least, zlib appears to be compiled into “standard” Ruby. For
the PragProg distribution for Windows, zlib is also present.

-austin

···


austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2004.01.07
* 10.31.03

I’ve got the bare bones of a Ruby implementation of adaptive arithmetic
encoding, using the Rational module. But it’s r-e-a-l-l-y s-l-o-w. :slight_smile:

-Mark

···

On Wed, Jan 07, 2004 at 11:37:24AM +0900, Jamis Buck wrote:

I’ve played with the idea of implementing zlib in pure Ruby…
unfortunately, I’ve never got beyond that. I’ve got about 30% of a
working tar implementation, but that’s not really compression. I’d be
very interested in a pure ruby zlib or bzip2 implementation, if you
decide to write one. :wink:

Austin Ziegler wrote:

On Linux, at least, zlib appears to be compiled into “standard” Ruby. For
the PragProg distribution for Windows, zlib is also present.

-austin

austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2004.01.07
* 10.31.03

Thanks. It is true, that with 1.8.x and zlib, there shouldn’t be a real
necessity for implementing the zlib algorithm. I wasn’t aware that it
came standard with 1.8.

However, (and I’m speaking more broadly than of just compression
algorithms now) in general it is not possible to expect specific
functionality to be distributed standard with the interpreter. If we
wait for that, we’ll wind up with a multi-megabyte runtime that becomes
much more expensive (bandwidth-wise) to distribute. For such things, it
may be desirable to have a pure-ruby implementation of the
functionality that was desired (be it compression, gui toolkits, etc.,
etc.).

My two cents. I’ll shut up now.

···


Jamis Buck
jgb3@email.byu.edu

ruby -h | ruby -e ‘a=;readlines.join.scan(/-(.)[e|Kk(\S*)|le.l(…)e|#!(\S*)/) {|r| a << r.compact.first };puts “\n>#{a.join(%q/ /)}<\n\n”’

Hi!

  • Wesley J Landaker:

require ‘zlib’

Since it comes with ruby. =)

Wasn’t aware that 1.8 has zlib binding. watashi wa baka desu :->

Josef ‘Jupp’ SCHUGT

···


http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Germany 2004: To boldly spy where no GESTAPO / STASI has spied before

Mark J. Reed wrote:

I’ve got the bare bones of a Ruby implementation of adaptive arithmetic
encoding, using the Rational module. But it’s r-e-a-l-l-y s-l-o-w. :slight_smile:

-Mark

Heh. :slight_smile: Well, the point of doing it in Ruby isn’t to win races, I
guess. It’s more to make these things portable. I’m sure a zlib or
bzip2 algorithm implemented in Ruby would be dog slow, too, but the
convenience provided by having them available directly, on any platform,
has to be taken into account.

Is your code in the RAA, perchance? I’d be interested in taking a peek.

···


Jamis Buck
jgb3@email.byu.edu

ruby -h | ruby -e ‘a=;readlines.join.scan(/-(.)[e|Kk(\S*)|le.l(…)e|#!(\S*)/) {|r| a << r.compact.first };puts “\n>#{a.join(%q/ /)}<\n\n”’

Josef ‘Jupp’ SCHUGT wrote:

Hi!

  • Wesley J Landaker:

require ‘zlib’

Since it comes with ruby. =)

Wasn’t aware that 1.8 has zlib binding. watashi wa baka desu :->

                                       ^^^^^^^^^^^^^^^^^^^^

Not if you know how to say that :wink:

“Jamis Buck” jgb3@email.byu.edu schrieb im Newsbeitrag
news:3FFB8A22.7010600@email.byu.edu…

Mark J. Reed wrote:

I’ve got the bare bones of a Ruby implementation of adaptive arithmetic
encoding, using the Rational module. But it’s r-e-a-l-l-y s-l-o-w. :slight_smile:

-Mark

Heh. :slight_smile: Well, the point of doing it in Ruby isn’t to win races, I
guess. It’s more to make these things portable. I’m sure a zlib or
bzip2 algorithm implemented in Ruby would be dog slow, too, but the
convenience provided by having them available directly, on any platform,
has to be taken into account.

But wouldn’t it bettern then to implement it in C (or use an existing
implementation)? I mean, C is portable.

Cheers

robert

This doesn’t make any sense. When it’s possible to have a ansi c compliant
program (the ruby interpreter) which can be compiled on/for a (unix environment
on a) foreign platform, it’ll also be possible to compile zlib or whatsoever
there. Probably it’s more likely for such libs to be portable than ruby
itself.

I dunno where you people think ruby is running on (aside of your mind :).

Regards,

-Martin

···

On Wed, Jan 07, 2004 at 01:19:07PM +0900, Jamis Buck wrote:

(… quote snipped …)
Heh. :slight_smile: Well, the point of doing it in Ruby isn’t to win races, I
guess. It’s more to make these things portable. I’m sure a zlib or
bzip2 algorithm implemented in Ruby would be dog slow, too, but the
convenience provided by having them available directly, on any platform,
has to be taken into account.

Robert Klemme wrote:

But wouldn’t it bettern then to implement it in C (or use an existing
implementation)? I mean, C is portable.

Cheers

robert

Very true. Generally, C is portable, and in general, such things
should be (and are) written in C. However, C is not portable when a
module is not available as a pre-compiled binary for a particular
platform, and the user does not have access to a C compiler (or is not a
programmer and is not confident compiling their own module). In such
cases, the value of simply dropping the module into the Ruby path is
quite high.

Please don’t misunderstand me. I am not saying that every C module ever
written for Ruby should have been written in Ruby. That would be
ludicrous. I’m simply saying that for some things (compression
algorithms, for instance), there is some value in having them available
both as a C modules, and as pure Ruby modules.

Consider the case of self-extracting Ruby modules. You simply
distribute the pure-ruby version of the compression library with your
own code, and then anyone can extract your module, without having to go
through some (potentially elaborate) installation scheme. Before I
finally broke down and installed both wxWindows and FOX, I would avoid
programs that used them because I’ve always found it to be a pain to
install those systems (partly because, in addition to the systems
themselves, you then had to install bindings for your language of
choice). (Until I started using Gentoo, anyway, but that’s beside the
point.)

Anyway, I hope that has clarified my position. I’m not trying to start
a flame war on the merits of C modules versus pure Ruby modules (or even
wxWindows and FOX–those were just examples from my own personal
experience), so please don’t be too vehement with any opposing opinions. :slight_smile:

···


Jamis Buck
jgb3@email.byu.edu

ruby -h | ruby -e ‘a=;readlines.join.scan(/-(.)[e|Kk(\S*)|le.l(…)e|#!(\S*)/) {|r| a << r.compact.first };puts “\n>#{a.join(%q/ /)}<\n\n”’

“Jamis Buck” jgb3@email.byu.edu schrieb im Newsbeitrag
news:3FFC0099.2080408@email.byu.edu…

Robert Klemme wrote:

But wouldn’t it bettern then to implement it in C (or use an existing
implementation)? I mean, C is portable.

Very true. Generally, C is portable, and in general, such things
should be (and are) written in C. However, C is not portable when a
module is not available as a pre-compiled binary for a particular
platform, and the user does not have access to a C compiler (or is not a
programmer and is not confident compiling their own module). In such
cases, the value of simply dropping the module into the Ruby path is
quite high.

Another reasonable conclusion is to include a ruby module for such a basic
thing as (de)compression into the std distribution. At least the cygwin
distribution includes Zlib:

irb(main):001:0> require ‘zlib’
=> true
irb(main):002:0> ObjectSpace.each_object( Class ) {|cl| p cl if
/zip|zlib/i =~ cl.name }
Zlib::GzipReader
Zlib::GzipWriter
Zlib::GzipFile::LengthError
Zlib::GzipFile::CRCError
Zlib::GzipFile::NoFooter
Zlib::GzipFile::Error
Zlib::GzipFile
Zlib::Inflate
Zlib::Deflate
Zlib::ZStream
Zlib::VersionError
Zlib::BufError
Zlib::MemError
Zlib::StreamError
Zlib::DataError
Zlib::NeedDict
Zlib::StreamEnd
Zlib::Error
=> 358
irb(main):003:0>

Does that help? :slight_smile:

Anyway, I hope that has clarified my position. I’m not trying to start
a flame war on the merits of C modules versus pure Ruby modules (or even
wxWindows and FOX–those were just examples from my own personal
experience), so please don’t be too vehement with any opposing opinions.
:slight_smile:

No offense taken.

Kind regards

robert

Jamis Buck jgb3@email.byu.edu writes:

Please don’t misunderstand me. I am not saying that every C module ever
written for Ruby should have been written in Ruby. That would be
ludicrous. I’m simply saying that for some things (compression
algorithms, for instance), there is some value in having them available
both as a C modules, and as pure Ruby modules.

A good point. That’s what some Smalltalks do, but in an even more useful
way. Primitives are sometimes implemented in C (or at least in the VM
itself, rather than in pure Smalltalk). If the primitive fails or does not
exist, then alternate Smalltalk code can be run.

For example, this is from Squeak’s Object class:

instVarAt: index
“Primitive. Answer a fixed variable in an object. The numbering of the
variables corresponds to the named instance variables. Fail if the
index is not an Integer or is not the index of a fixed variable. Essential.
See Object documentation whatIsAPrimitive.”

<primitive: 73>
"Access beyond fixed variables."
^self basicAt: index - self class instSize

The Squeak Object class has a method called “whatIsAPrimitive” whose entire
purpose is to act as documentation. Here is a snippet from the comment:

When the Smalltalk interpreter begins to execute a method which
specifies a primitive response, it tries to perform the primitive
action and to return a result. If the routine in the interpreter for
this primitive is successful, it will return a value and the
expressions in the method will not be evaluated. If the primitive
routine is not successful, the primitive 'fails', and the Smalltalk
expressions in the method are executed instead. These expressions are
evaluated as though the primitive routine had not been called.

Jim

···


Jim Menard, jimm@io.com, http://www.io.com/~jimm/
“T-shirts are the computer industry’s only persistent objects.”
– Seen in R. L. Peskin’s .sig