Python generators to ruby closures, help

can anyone help me decode the following python example using
generators into The Ruby Way using closures?

thanks,
-z

http://c2.com/cgi/wiki?GeneratorsInPython

You can emulate the unix-style shell piping with generators.

    from __future__ import generators

    class GenPipe?:
        def __init__(self, generator):
            self.gen = generator #sanity check omitted
        def __or__(self,nextPipe):
            self.gen=nextPipe._run(self.gen)
            return self
        def __iter__(self):
            return self.gen

    class PipeFilter?:
        def __init__(self):
            self._followingPipes=[]
        def __or__(self,nextPipe):
            self._followingPipes+=(nextPipe,)
            return self
        def _run(self,gen):
            gen=self.run(gen)
            while self._followingPipes:
                gen=self._followingPipes[0].run(gen)
                self._followingPipes.pop(0)
            return gen
        def run(self,gen):
            raise NotImplementedError?

    class Test(PipeFilter?):
        def __init__(self,test):
            PipeFilter?.__init__(self)
            self.test=test
        def run(self,gen):
            for x in gen:
                if self.test(x):
                    yield x

    class Quit(PipeFilter?):
        def __init__(self,test):
            PipeFilter?.__init__(self)
            self.test=test
        def run(self,gen):
            for x in gen:
                if self.test(x):
                    raise StopIteration?
                else:
                    yield x

    class Count(PipeFilter?):
        def __init__(self,n):
            PipeFilter?.__init__(self)
            self.n=n
        def run(self,gen):
            for x in gen:
                yield x
                self.n -= 1
                if self.n == 0 :
                    break

    def Ints():
        n = 0
        while 1:
            yield n
            n += 1

    def Files( start ):
        import os
        for file in os.listdir( start ):
            file = os.path.join( start, file )
            if os.path.isfile( file ): yield file
            elif os.path.isdir(file):
                for more in Files( file ):
                    yield more

    >>> odd = Test(lambda x:x % 2)
    >>> notDivBy3 = Test(lambda x: x % 3 )
    >>> oddNotDivBy3 = odd|notDivBy3
    >>> firstTen= Count(10)
    >>> result=GenPipe?(Ints())|firstTen|oddNotDivBy3
    >>> for each in result:
    >>> print each

    >>> oddLowerThanTen=odd|Quit(lambda y:y>=10)
    >>> p3=GenPipe?(Ints())|oddLowerThanTen|notdiv3

    >>> isPyFile=Test(lambda s:s.split('.')[-1].lower()=='py')
    >>> tenPyFiles=GenPipe?(Files('/'))|isPyFile|Count(10)
    >>> for each in tenPyFiles:
    >>> print each

-- JuneKim

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/21900

···

Subject: [ruby-talk:21900] Re: Python generators
From: Will Conant <WillC webmiles.com>
Date: Tue, 2 Oct 2001 08:18:10 +0900

Generators are a nifty feature for Python 2.2, but they are a solution to
Python problems... problems that Ruby doesn't have.

In Ruby, it is trivial to pass a closure that interacts with the scope that
created it to another function. Thus, callbacks in Ruby are sufficiently
useful to solve the problems that Python generators were designed to solve.

The example that I read somewhere on the Python site was the tokenizer
problem. The original approach to the tokenizer problem was to have a
function that took two parameters as input: the text source and a function
pointer to a callback that processed the tokens. The problem with this
approach was that the callback function, designed by the user, didn't have a
very good way of keeping state between calls. It could use globals, but that
would suck.

The other approach was to write a tokenizer object that kept its state and
had a "nextToken" function that looked for the next token in the stream and
returned it to the caller. This solution hoisted the state problem off on
the tokenizer instead of the user. Still a problem especially considering
the difficulty of unrolling very complex loops of logic.

For Python, where closures aren't that easy to use, the generator is the
ideal solution. This way, you write your tokenizer function as if were to
use a callback so you don't have to unroll your loop. However, your function
gets wrapped up in an iterator object, so the user uses it as if it were
implemented as a Tokenizer object with a "nextToken" function.

In Ruby, you could probably emulate this behavior directly, but you really
wouldn't need to. Instead, you would just write your tokenizer to use a
callback, and your callback method would be a local closure with local state
persistant between calls. Problem solved.

Anyway, that's the story. Generators are a whitespace friendly solution to a
few of the problems that Ruby's super-duper closure mechanism solves just
fine.

-Will Conant

zuzu wrote:

can anyone help me decode the following python example using
generators into The Ruby Way using closures?

I'm sure the Ruby version must be human-readable... maybe you can describe in short words, what the Python example tries to achive?

You might have a look at generator.rb which ships with Ruby and implements generators or "streams" with continuations.

Regards,

   Michael

zuzu wrote:

can anyone help me decode the following python example using
generators into The Ruby Way using closures?

Okay, here is it :wink:

One problem is, that we can't detect the usage of "break" inside the proc, as we're using proc's and not blocks. My workaround is, that a return value of nil of a proc.call breaks the generator at this position (so make sure to use false and not nil, e.g. "val.nil?" instead just "val"!).

Regards,

   Michael

···

#######
require 'generator'
require 'find'

alias filter lambda

class Generator
   def |(filter)
     self.class.new do |g|
       self.each { |elem|
         val = filter.call(elem)
         break if val.nil?
         g.yield(elem) if val
       }
     end
   end

   def inspect
     to_a.inspect
   end
end

class Proc
   def |(otherProc)
     proc {|*a|
       x, y = self.call(*a), otherProc.call(*a)
       break nil if x.nil? or y.nil?
       x & y
     }
   end
end

def count(n)
   filter {|i|
     break if n == 0
     n -= 1
     i
   }
end

def quit(&block)
   filter {|*a|
     break if block.call(*a)
     true
   }
end

def gen(&block)
   Generator.new(&block)
end

def ints
   gen {|g|
     n = 0
     loop do
       g.yield n
       n += 1
     end
   }
end

##### Test

odd = filter {|x| x % 2 == 1}
notDivBy3 = filter {|x| x % 3 != 0}
oddNotDivBy3 = odd|notDivBy3
firstTen = count(10)
result = ints | firstTen | oddNotDivBy3
p result

oddLowerThanTen = odd | quit {|y| y >= 10}
p3 = ints | oddLowerThanTen | notDivBy3
p p3

isRubyFile = filter {|file| File.extname(file).downcase == '.rb'}
files = gen {|g| Find.find('.') {|file| g.yield file} }
tenRubyFiles = files | isRubyFile | count(10)
p tenRubyFiles

zuzu wrote:
> can anyone help me decode the following python example using
> generators into The Ruby Way using closures?

I'm sure the Ruby version must be human-readable... maybe you can
describe in short words, what the Python example tries to achive?

actually i'm not much of a python programmer; stopped learning it once
i sunk my teeth into ruby. so other than the generic "to implement
unix-style pipes and filters", i'm not really sure *how* that was
working (and hoping someone here was python-ruby bilingual).

You might have a look at generator.rb which ships with Ruby and
implements generators or "streams" with continuations.

however, armed with this newly acquired knowledge, perhaps that's all
i needed to know. thanks!

Regards,

   Michael

peace,
-z

···

On Fri, 27 Aug 2004 04:14:42 +0900, Michael Neumann <mneumann@ntecs.de> wrote:

And possibly at pipe.rb

martin

···

Michael Neumann <mneumann@ntecs.de> wrote:

zuzu wrote:
> can anyone help me decode the following python example using
> generators into The Ruby Way using closures?

I'm sure the Ruby version must be human-readable... maybe you can
describe in short words, what the Python example tries to achive?

You might have a look at generator.rb which ships with Ruby and
implements generators or "streams" with continuations.

word.

http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html

am i correct in assuming that "makes an internal iterator external" is
just another way of saying "turns a subroutine into a continuation",
"using yield rather than return"? or aka, a function to continuation
translator?

-z

···

On Thu, 26 Aug 2004 15:42:31 -0400, zuzu <sean.zuzu@gmail.com> wrote:

On Fri, 27 Aug 2004 04:14:42 +0900, Michael Neumann <mneumann@ntecs.de> wrote:
> zuzu wrote:
> > can anyone help me decode the following python example using
> > generators into The Ruby Way using closures?
>
> I'm sure the Ruby version must be human-readable... maybe you can
> describe in short words, what the Python example tries to achive?

actually i'm not much of a python programmer; stopped learning it once
i sunk my teeth into ruby. so other than the generic "to implement
unix-style pipes and filters", i'm not really sure *how* that was
working (and hoping someone here was python-ruby bilingual).

> You might have a look at generator.rb which ships with Ruby and
> implements generators or "streams" with continuations.

however, armed with this newly acquired knowledge, perhaps that's all
i needed to know. thanks!

> Regards,
>
> Michael

peace,
-z

this looks quite straight-forward, thank you. as with Generator, i've
long suspected this would involve some inherit tweaking of Enumerable,
and all the implementations of this genre seem to do just that.

in the quote below, this is basically "lazy evaluation", right?
(which i think is "by definition" with asynchronous continuations
rather than synchronous returns.)

Using #pipe has potential performance advantages. The iteration

e.collect { |x| x.m }.each { |y| ... }

can be rewritten as

e.pipe(:m).each { |y| ... }

which doesn't generate an intermediate array, and uses a send instead
of a proc call. Of course, it could also be written as

e.each { |x| y = x.m ... }

but that may be undesirable for readability or because the block is to
be taken from a proc contained in a variable:

pr = proc { ... }
e.pipe(:m).each &pr

···

On Fri, 27 Aug 2004 05:20:54 +0900, Martin DeMello <martindemello@yahoo.com> wrote:

Michael Neumann <mneumann@ntecs.de> wrote:
> zuzu wrote:
> > can anyone help me decode the following python example using
> > generators into The Ruby Way using closures?
>
> I'm sure the Ruby version must be human-readable... maybe you can
> describe in short words, what the Python example tries to achive?
>
> You might have a look at generator.rb which ships with Ruby and
> implements generators or "streams" with continuations.

And possibly at pipe.rb

martin

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):

http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html

am i correct in assuming that "makes an internal iterator external" is
just another way of saying "turns a subroutine into a continuation",
"using yield rather than return"? or aka, a function to continuation
translator?

I don't think that's right.T he goal of Generator is something much more
specific: to turn an internal iterator, i.e. a method that takes a block
argument, into an external iterator, i.e. an object with explicit #next?
and #next methods.

Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way. Instead, you'd use
Generator to create an object for each Enumerable and explicitly have a
loop that called #next alternately on both objects.

Ruby's internal iterators are far, far friendlier to use than, say,
Java's external iterators (writing for loops really is a pain) but there
are some problems they just don't yield (hah hah) natural solutions to.

Sorry if that isn't what you were asking. I don't know whether this is
related to Python's "generator" objects at all.

···

--
William <wmorgan-ruby-talk@masanjin.net>

zuzu wrote:

zuzu wrote:

can anyone help me decode the following python example using
generators into The Ruby Way using closures?

I'm sure the Ruby version must be human-readable... maybe you can
describe in short words, what the Python example tries to achive?

You might have a look at generator.rb which ships with Ruby and
implements generators or "streams" with continuations.

And possibly at pipe.rb

martin

this looks quite straight-forward, thank you. as with Generator, i've
long suspected this would involve some inherit tweaking of Enumerable,
and all the implementations of this genre seem to do just that.

in the quote below, this is basically "lazy evaluation", right? (which i think is "by definition" with asynchronous continuations
rather than synchronous returns.)

I guess it's lazy evaluation, in the sense that no intermediate array is generated. The upstream iteration step happens only when the downstream iteration step asks for the next object.

Note that this #pipe method has the advantage of not having the (constant time) overhead of creating a continuation, but it has the disadvantage of being an internal (did I get that right?) iterator, just like #each or any other of the usual Enumerable methods.

Oddly, I wrote this code several years ago, but have never used it for some reason. I have to thank Martin for even remembering that it existed.

···

On Fri, 27 Aug 2004 05:20:54 +0900, Martin DeMello > <martindemello@yahoo.com> wrote:

Michael Neumann <mneumann@ntecs.de> wrote:

really my big question is how to use ruby for concurrent execution of
unix-style pipes and filters chaining objects together as in the
actors model (flow-based programming, aka dataflow). by definition of
the actors model, this requires the use of continuations and bounded
buffers (partial ordering) rather than threads; the idea being that
you never have to manually manage (dead)locks much as a GC means never
manually managing memory. (even the best programmers tend to suck at
handcrafting memory management and thread locks; 99% of the time it's
best left to the computer.)

i've got data being generated all over the place. logfiles, web
pages, file transfers, encryption/decryption of data, compression of
data, etc. etc. some objects will generate data, with their outputs
named by their methods. other objects will input data, filter it, and
output the results, also named (as "ports) with methods. in the end
some objects will consume that data, such as a framebuffer to display
a gui or perhaps an IP:port on the internet to transfer files to. i
would like ultimately to simply pipe these objects together and have
them concurrently process their dataflow. this is basically what the
actors model is all about. (it's also what unix streams / pipes &
filters are all about.)

hence my interest in generators and streams in ruby.

does that present a clearer contextual picture?

peace,
-z

···

On Fri, 27 Aug 2004 05:42:05 +0900, William Morgan <wmorgan-ruby-talk@masanjin.net> wrote:

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):
> http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html
>
> am i correct in assuming that "makes an internal iterator external" is
> just another way of saying "turns a subroutine into a continuation",
> "using yield rather than return"? or aka, a function to continuation
> translator?

I don't think that's right.T he goal of Generator is something much more
specific: to turn an internal iterator, i.e. a method that takes a block
argument, into an external iterator, i.e. an object with explicit #next?
and #next methods.

Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way. Instead, you'd use
Generator to create an object for each Enumerable and explicitly have a
loop that called #next alternately on both objects.

Ruby's internal iterators are far, far friendlier to use than, say,
Java's external iterators (writing for loops really is a pain) but there
are some problems they just don't yield (hah hah) natural solutions to.

Sorry if that isn't what you were asking. I don't know whether this is
related to Python's "generator" objects at all.

--
William <wmorgan-ruby-talk@masanjin.net>

William Morgan wrote:

Think about trying to interlace two Enumerables, e.g. turn ["a","b","c"]
and [1,2,3] into ["a",1,"b",2,"c",3]. You can't use Enumerable#each (an
internal iterator) to do this in a natural way.

But you can use Enumerable#zip -- but you're still right in that internal iterators aren't the most natural solution for everything.

Here's the #zip solution which shows that you can use it for iterating over multiple Enumerables in parallel:

irb(main):002:0> ["a", "b", "c"].zip([1, 2, 3]) do |(string, number)|
irb(main):003:1* puts "#{string} => #{number}"
irb(main):004:1> end
a => 1
b => 2
c => 3

What this won't let you do however is skipping elements from only one of the Enumerables.

Regards,
Florian Gross

Excerpts (reformatted) from zuzu's mail of 26 Aug 2004 (EDT):

hence my interest in generators and streams in ruby.

does that present a clearer contextual picture?

Yes. I don't think you want the Generator class at all then. As the
email you quoted said, Python generators are a solution to a problem
that Ruby doesn't have. Generators have a place in Ruby but this ain't
it.

From what I can tell, all the functionality they build up in the Python
example is just so that they can do thing that in Ruby can be done
neatly with iterators, like this:

  odd = proc { |x| (x % 3) == 0 }
  div_by_3 = proc { |x| (x % 3) == 0 }
  result = (1 .. 10).reject(odd).reject(div_by_3).each { |x| print x }

which has a "pipey" feel to it already, and with no preexisting work.

If you really want all the syntactic sugar that they have, I would say
something like this:

  class Pipe
    def initialize(&proc)
      @proc = proc
    end

    attr_reader :proc

    def +(other)
      Pipe.new { |x| self.proc || other.proc }
    end
  end

  module Enumerable
    def drain(pipe)
      self.reject { |x| pipe.proc }
    end
  end

Then you can do cool stuff like this:

  odd = Pipe.new { |x| (x % 2) == 0 }
  not_div_3 = Pipe.new { |x| (x % 3) == 0 }
  gt_5 = Pipe.new { |x| x <= 5 }

  p = odd + not_div_3 + gt_5
  (1 .. 20).drain p # => [7, 11, 13, 17, 19]

  is_rb_file = Pipe.new { |s| s !~ /\.rb$/ }
  first_ten = proc do # here we have to use a closure
    i = 0
    Pipe.new { (i += 1) > 10 }
  end.call

  Dir.entries(".").drain is_rb_file + first_ten # => the right thing

... which replicates the Python functionality, but without being quite
so ugly. I personally find "positive pipes" a little easier than
"negative pipes", but I've kept with their schemes for now.

HTH.

···

--
William <wmorgan-ruby-talk@masanjin.net>

William Morgan wrote:

Yes. I don't think you want the Generator class at all then. As the
email you quoted said, Python generators are a solution to a problem
that Ruby doesn't have. Generators have a place in Ruby but this ain't
it.

From what I can tell, all the functionality they build up in the Python
example is just so that they can do thing that in Ruby can be done
neatly with iterators, like this:

As William pointed out, enumerator pipes can be implemented in Ruby
without resorting to continuations, and that the current enumerator
syntax is very pipe-like in many ways.

However, one thing the Python version handles that enumerator chaining
does not is an infinite enumerator. In the python examples, we have
the following construct:

    Ints() | firstTen | oddNotDivBy3

Ints() is a generator that will generate all the integers (given
enough time and space). The next filter "firstTen" only takes the
first 10 of those numbers. Since the generator chain only takes a
number from a generator when it is needed, the Ints() generator is
never asked for any more than 10 number.

We can do the same with Ruby enumerators. Imagine a OddNumberFilter
object that implements a each method like this ...

   class OddNumberFilter
     include Enumerable
     attr_accessor :source
     def each
       @source.each { |n| yield(n) if (n % 2) != 0 }
     end
   end

It uses a source enumerable to generate numbers and only yields when
it finds an odd number. We can use it like this ...

    odds_only = OddNumberFilter.new
    odds_only.source = (1..10)
    odds_only.to_a # => [1, 3, 5, 7, 9]

Since a filter can take _any_ enumerable as a source, it can even take
other filters. Let's generalize our OddNumberFilter class to a
CondFilter class that takes a condition and only passes items that
pass the condition.

   class CondFilter
     include Enumerable
     attr_accessor :source
     def initialize(&block)
       @block = block
     end
     def each
       @source.each { |it| yield(it) if @block.call(it) }
     end
   end

Now we create a chain of filters ...

   odds_only = CondFilter.new { |n| (n % 2) != 0 }
   odds_only.source = (1..10)

   divisible_by_three = CondFilter.new { |n| (n % 3) == 0 }
   divisible_by_three.source = odds_only

   divisible_by_three.to_a #=> [3, 9]

It works, but it is ugly. So lets create a little syntactic sugar to
get this to work ...

   module Enumerable
     def |(filter)
       filter.source = self
       filter
     end
   end

   odds_only = CondFilter.new { |n| (n%2) != 0 }
   divisible_by_three = CondFilter.new { |n| (n%3) == 0 }

   pipe = (1..10) | odds_only | divisible_by_three
   pipe.to_a # => [3, 9]

With this foundation, handling infinite enumerations are easy ...

   class Ints
     include Enumerable
     def initialize
       @n = 0
     end
     def each
       loop do
         yield @n
         @n += 1
       end
     end
   end

And a class to take only so many items ...

   class Take
     include Enumerable
     attr_accessor :source

     def initialize(limit)
       @limit = limit
     end
     def each
       n = 0
       @source.each do |it|
         n += 1
         break if n > @limit
         yield it
       end
     end
   end

   ( Ints.new | Take.new(5) ).to_a # => [0, 1, 2, 3, 4]

From this point the File generator and the other stuff in the Python
example should be obvious.

Here's a fun puzzle. Using filters, create a Sieve of Erasothanes
filter that, given an input of Ints.new, will generate a series of
prime numbers. In other words, the code ...

   ( Ints.new | Primes.new | Take.new(100) ).to_a

will create an array of the first 100 prime numbers.

I've attached full examples and test suites for all the the above
(although factored a bit differently), so don't peek if you want to work
out the Primes puzzle ...

enumpipe.rb (1.77 KB)

testenumpipe.rb (2.49 KB)

···

--
-- Jim Weirich jim@weirichhouse.org http://onestepback.org
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)