Thoughts on generators

I’ve been looking through the generator threads in the archive, and
they all seem to centre around the problem of converting an internal
iterator to an external iterator. This would be a wonderful thing, I
admit, but so far there’s been no time and space efficient way to do
it (as far as I’ve seen, you either need to dump the enumerable to an
array, or use call/cc).

Leaving aside the general problem for the minute, it might be useful
to see where we can efficiently write

g = SimpleGenerator.new(a) # should be space-efficient
i = g.next # should be O(1) and cheap

and where the generator iterates over the sequence nondestructively,
so that it is threadsafe. One could use g.val rather than g.next,
depending on how you think of it.

SimpleGenerator.new would most likely need to be a factory that
instantiated more specific per-type generators. Some easy examples for
generator-generating objects:

  • arrays and other structures that support a[i] where i is an integer
    (initialize sets @i = -1, and .next does an @i+=1, @array[@i].dup)
  • functions on integers (.next returns f(@i))
  • sequences where f[i+1] = g(f[i], f[i-1], …) and which don’t depend
    on too large a history (the Fibonacci sequence, e.g., or a random
    number generator)
  • generator-only structures like unbounded lists and circular arrays

Another nice possibility is built-in support for parallel iteration on
generatable structures:

gen = SimpleGenerator.new (a,b,c,d) {|a,b,c,d| [[a,b],c+d]]}

where gen.next returns [a.next,b.next,c.next,d.next] if no block is
given, and passes it into the block if it is. The arguments could be
automatically converted to generators if they were of some generatable
type, which allows freely mixing generatables and anything else
responding to .next. Of course, this runs into the same problems as
zip does when one of the generators runs out before the others do.

Further ideas:

Composition:

  • g1 + g2 returns a generator that iterates over first g1 and then g2
  • g1 || g2 iterates in parallel over g1 and g2 (confusable with or?
    better name?
    perhaps [g1,g2] would be clearer, but we can overload
    SimpleGenerator#|| and not just plain [], so it’d be
    gen = SimpleGenerator.create([g1+g2, g3+g4]) versus
    gen = (g1+g2)||(g3+g4)

Filtering: have an optional filter function (via .filter), so that
next calls itself repeatedly until the filter succeeds.

#each: converting an external to an internal iterator is fairly
trivial, though there would, of course, be issues with circular and
infinite generators.

Implementation shouldn’t be too hard - I’ll put something up on the
wiki when I get a free moment. Comments? Has anyone done something
like this already?

martin

I’ve been looking through the generator threads in the archive, and
they all seem to centre around the problem of converting an internal
iterator to an external iterator. This would be a wonderful thing, I
admit, but so far there’s been no time and space efficient way to do
it (as far as I’ve seen, you either need to dump the enumerable to an
array, or use call/cc).

[snip long interesting post]

Martin,

I’ve only halfway followed this thread, as
I’m not 100% sure in what sense people are
using the term “generator”… if it’s in a
pure Python sense, well, my knowledge of
Python is very limited.

Why don’t you read the post I made
back in August and see if you think
there’s any connection…

The thread is entitled “Super-iterator? (long)”
and starts at http://ruby-talk.org/46337

I was attempting to solve three or four
problems at once (some of which may be
non-problems to other people). One of
these was the lack of “on-demand” iteration
which as I understand it may be the same
as generation.

Let me know your thoughts…

Thanks,
Hal Fulton

···

----- Original Message -----
From: “Martin DeMello” martindemello@yahoo.com
Newsgroups: comp.lang.ruby
To: “ruby-talk ML” ruby-talk@ruby-lang.org
Sent: Tuesday, November 19, 2002 6:49 AM
Subject: Thoughts on generators

If the behaviour of SimpleGenerator.new(a) depends on the type of a –
e.g. is polymorphic with respect to a – then it should really be a
method defined by a.

E.g. your example would be reformulated as:

g = a.generator
i = g.next

Cheers,
Nat.

···

On Tue, 2002-11-19 at 12:49, Martin DeMello wrote:

…it might be useful to see where we can efficiently write

g = SimpleGenerator.new(a) # should be space-efficient
i = g.next # should be O(1) and cheap

and where the generator iterates over the sequence nondestructively,
so that it is threadsafe. One could use g.val rather than g.next,
depending on how you think of it.

SimpleGenerator.new would most likely need to be a factory that
instantiated more specific per-type generators. Some easy examples for
generator-generating objects:

  • arrays and other structures that support a[i] where i is an integer
    (initialize sets @i = -1, and .next does an @i+=1, @array[@i].dup)
  • functions on integers (.next returns f(@i))
  • sequences where f[i+1] = g(f[i], f[i-1], …) and which don’t depend
    on too large a history (the Fibonacci sequence, e.g., or a random
    number generator)
  • generator-only structures like unbounded lists and circular arrays


Nat Pryce nat.pryce@b13media.com
B13media

[snip]

Has anyone done something like this already?
[snip]

i have been using this. the iterator is compliments of Pit Capitain.

this doesn’t work with threads but i learned a lot from it…

eg.

a = [‘fee’,‘fie’,‘foe’,‘fum’]
h = {0 => ‘fee’, 1 => ‘fie’, 2 => ‘fum’, 3 => ‘fum’}

ai = a.iterator
hi = h.iterator

while ((n = ai.next)) do
print n.inspect, ’ ’
end; print “\n”

>> “fee” “fie” “foe” “fum”

while ((n = hi.next)) do
print n.inspect, ’ ’
end; print “\n”

>> [0, “fee”] [1, “fie”] [2, “fum”] [3, “fum”]

each a, h do |elem, kv|
print elem.inspect, ’ ', kv.inspect, ’ ’
end; print “\n”

>> “fee” [0, “fee”] “fie” [1, “fie”] “foe” [2, “fum”] “fum” [3, “fum”]

source.

--------cut--------
module Enumerable
class Iterator

public

attr_reader :enumerable, :has_next

def initialize enumerable, end_value = nil, &end_block
  @enumerable = enumerable
  @end_value = end_value
  @end_block = end_block
  initialize_fetch_block
end

def next
  @has_next ? fetch_next_element : fetch_end_value
end

def rewind
  initialize_fetch_block
  self
end

protected

def initialize_fetch_block
  callcc do |@after_fetch|
@has_next = true
@enumerable.each do |@next_element|
  callcc do |@next_fetch| @after_fetch.call end
end
@has_next = false
@next_fetch = nil
@after_fetch.call
  end
  @after_fetch = nil
end

def fetch_next_element
  result = @next_element
  callcc do |@after_fetch| @next_fetch.call end
  @after_fetch = nil
  result
end

def fetch_end_value
  @end_block ? @end_block.call : @end_value
end

end

def iterator
@iterator = Iterator.new self
end
end

module Kernel
def each(*enumerables, &block)
iterators = enumerables.collect{|e| e.iterator}
while true
args = iterators.collect{|i| i.next}
if args.detect{|arg| arg}
block.call *args
else
return enumerables
end
end
end
end
--------cut--------

-ara

···

On 19 Nov 2002, Martin DeMello wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Martin,

I’ve only halfway followed this thread, as
I’m not 100% sure in what sense people are
using the term “generator”… if it’s in a
pure Python sense, well, my knowledge of
Python is very limited.

Mine too - I just had a quick look at python generators, and they seem
to be semantically closer to the call/cc solutions proposed in ruby.
Unless I’ve misunderstood it entirely, it’s essentially an rubyish
internal iterator, but with ‘yield’ suspending the each method until
‘.next’ is called. My generators are closer in spirit to python’s
iterators.

Why don’t you read the post I made
back in August and see if you think
there’s any connection…

The thread is entitled “Super-iterator? (long)”
and starts at http://ruby-talk.org/46337

I was attempting to solve three or four
problems at once (some of which may be
non-problems to other people). One of
these was the lack of “on-demand” iteration
which as I understand it may be the same
as generation.

It’s not really the same thing. This splits into two parts:

  1. Define a set of useful and interesting methods on objects that
    implement .next (and .end?) - this would be a mixin like Enumerable
  2. Define a set of constructors that return nextable objects (which I’m
    calling generators) based on various other objects.

Nextable objects aren’t realy a subset or superset of Enumerables, so
while a generator can serve as an iterator over some objects, it’s not a
universal solution to the external iterator problem. OTOH it allows for
other interesting cases, like recurrence relations and (somewhat
primitive) unidirectional streams.

The other neat thing is that each can be built atop next, so you get
Enumerable practically for free. Looking at some of your super-iterator
motivations:

  1. Sometimes I want to know whether I’m in the first (or last)
    iteration of an iterator block.

Since the generator is an object, you can add first? and last? methods
to it.

  1. Sometimes I want to do things “alternately” (i.e., for every
    other iteration).

def evens
while !end?
yield next(2)[0]
end
end

  1. I’m a tiny bit displeased by the difference between each and
    each_with_index. I’d like them to be done by one thing.

Not sure what you mean here. What would the one thing do?

  1. Sometimes I want to change an item in an array, for instance.
    each won’t allow this; I have to use each_with_index and manually
    index into the array.

Hm - I suppose an accessor would be useful, in cases where it was
applicable. I’ll have to think about that.

  1. Sometimes an ordinary iterator seems inadequate; I want something
    that can grab the next entry on demand, without waiting for the
    loop to run around again.

Which is what .next is all about - it just won’t be applicable in the
general case. Of course, you could always pass in obj.to_a, though that would
create an intermediate array.

martin

···

Hal E. Fulton hal9000@hypermetrics.com wrote:

Right you are. Something like

class Range
def generator
RangeGenerator.new(self)
end
end

for each class, or is there a better way to do it? One problem though is
that this doesn’t let us create generators for arbitrary classes
(frinstance, anything that supports self[int] could yield a generator),
but I could keep SimpleGenerator.new around as well.

martin

···

Nat Pryce nat.pryce@b13media.com wrote:

If the behaviour of SimpleGenerator.new(a) depends on the type of a –
e.g. is polymorphic with respect to a – then it should really be a
method defined by a.

E.g. your example would be reformulated as:

g = a.generator
i = g.next