Iterator in ruby

What's the data structure in ruby for Iterator?
What I meant is ,the Iterator is lazy and presents only one at each time.
(This is the default data type for file IO in scala)
I want it to read very big files and streams etc.
Maybe I have misunderstood the use of ruby's data structure. Please point
that out.

Thanks

There are two data structures you can iterate. One is Array which is equivalent to list in Python, another is called Hash which is a key value pair which is equivalent to Dictionary in Python.

If you are beginner, you can learn about arrays here I Love Ruby: Get started with the greatest programming language made for humans. and about hashes here I Love Ruby: Get started with the greatest programming language made for humans.

Now coming to files, I am not sure if there a way to deal with big files, say file size kind of exceeding RAM size, if you are doing such things Hadoop's HDFS is the best way to go. But you can learn basics of files here I Love Ruby: Get started with the greatest programming language made for humans.

I should warn you that I am giving links from a Ruby book I wrote so its very biased, so you need to search the internet more to learn better.

- Karthikeyan A K
+91 8428050777

···

Sent with [ProtonMail](https://protonmail.com/) Secure Email.

------- Original Message -------
On Saturday, January 29th, 2022 at 10:02 AM, Adriel Peng <peng.adriel@gmail.com> wrote:

What's the data structure in ruby for Iterator?
What I meant is ,the Iterator is lazy and presents only one at each time.
(This is the default data type for file IO in scala)
I want it to read very big files and streams etc.
Maybe I have misunderstood the use of ruby's data structure. Please point that out.

Thanks

These two types aren't what I was looking for.

val fh = Source.fromFile(file).getLines()

val fh: Iterator[String] = <iterator>

fh.foreach(println)

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"

fh.foreach(println)

As you see the file handle is a Iteraotr (in Scala). after traversing with
foreach, it will become empty.
As I have said, an iterator is lazy and presents only one at each time.

So, array.map for big input is impossible. but iterator.map for that is
possible.
That's why I am asking the question.

Thanks.

···

On Sat, Jan 29, 2022 at 1:40 PM Karthikeyan A K <mindaslab@protonmail.com> wrote:

There are two data structures you can iterate. One is Array which is
equivalent to list in Python, another is called Hash which is a key value
pair which is equivalent to Dictionary in Python.

If you are beginner, you can learn about arrays here
I Love Ruby: Get started with the greatest programming language made for humans. and about hashes here
I Love Ruby: Get started with the greatest programming language made for humans.

Now coming to files, I am not sure if there a way to deal with big files,
say file size kind of exceeding RAM size, if you are doing such things
Hadoop's HDFS is the best way to go. But you can learn basics of files here
I Love Ruby: Get started with the greatest programming language made for humans.

I should warn you that I am giving links from a Ruby book I wrote so its
very biased, so you need to search the internet more to learn better.

- Karthikeyan A K
+91 8428050777

Sent with ProtonMail <https://protonmail.com/&gt; Secure Email.

------- Original Message -------
On Saturday, January 29th, 2022 at 10:02 AM, Adriel Peng < > peng.adriel@gmail.com> wrote:

What's the data structure in ruby for Iterator?
What I meant is ,the Iterator is lazy and presents only one at each time.
(This is the default data type for file IO in scala)
I want it to read very big files and streams etc.
Maybe I have misunderstood the use of ruby's data structure. Please point
that out.

Thanks

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

(Apologies for top responding... Apparently haven't setup thunderbird correctly...)

Ruby uses Enumerable. If something is going to be treated as what you're referring to as an "Iterator", the Class would Include "Enumerable" and define an "#each" method.

Including Enumerable gives a few things "for free" (like map), but also allows you to call #lazy on the class, which returns a LazyEnumerable, which you could then use for other methods.

Hope that helps.

-James

···

On 1/28/22 10:01 PM, Adriel Peng wrote:

These two types aren't what I was looking for.

> val fh = Source.fromFile(file).getLines()
val fh: Iterator[String] = <iterator>

> fh.foreach(println)
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"

> fh.foreach(println)

>

As you see the file handle is a Iteraotr (in Scala). after traversing with foreach, it will become empty.
As I have said, an iterator is lazy and presents only one at each time.

So, array.map for big input is impossible. but iterator.map for that is possible.
That's why I am asking the question.

Thanks.

On Sat, Jan 29, 2022 at 1:40 PM Karthikeyan A K > <mindaslab@protonmail.com> wrote:

    There are two data structures you can iterate. One is Array which
    is equivalent to list in Python, another is called Hash which is a
    key value pair which is equivalent to Dictionary in Python.

    If you are beginner, you can learn about arrays here
    I Love Ruby: Get started with the greatest programming language made for humans. and about hashes
    here I Love Ruby: Get started with the greatest programming language made for humans.

    Now coming to files, I am not sure if there a way to deal with big
    files, say file size kind of exceeding RAM size, if you are doing
    such things Hadoop's HDFS is the best way to go. But you can learn
    basics of files here I Love Ruby: Get started with the greatest programming language made for humans.

    I should warn you that I am giving links from a Ruby book I wrote
    so its very biased, so you need to search the internet more to
    learn better.

    - Karthikeyan A K
    +91 8428050777

    Sent with ProtonMail <https://protonmail.com/&gt; Secure Email.

    ------- Original Message -------
    On Saturday, January 29th, 2022 at 10:02 AM, Adriel Peng > <peng.adriel@gmail.com> wrote:

    What's the data structure in ruby for Iterator?
    What I meant is ,the Iterator is lazy and presents only one at
    each time.
    (This is the default data type for file IO in scala)
    I want it to read very big files and streams etc.
    Maybe I have misunderstood the use of ruby's data structure.
    Please point that out.

    Thanks

    Unsubscribe:
    <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
    <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Unsubscribe:<mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

It's also worth pointing out that it's up to or original class to
decide whether it implements a streaming (lazy) iterator or by
generating an in-memory array. For example, File#each_line might
maintain a cursor and walk the file as it is iterated over, or it
might slurp the entire file into memory first. The Enumerable module
provides a lot of helpful utility methods, but the class that
implements it has to define its own #each, which it can do however it
wants.

The Enumerator::Lazy class (via
Module: Enumerable (Ruby 3.1.0) James
mentioned above) provides an explicit lazy/eager interface, but
doesn't necessarily enforce streaming iteration. It's useful for
reaching into infinite iterations, for example.

Cheers

···

On Sat, 29 Jan 2022 at 17:25, James Pacheco <james.pacheco@gmail.com> wrote:

(Apologies for top responding... Apparently haven't setup thunderbird correctly...)

Ruby uses Enumerable. If something is going to be treated as what you're referring to as an "Iterator", the Class would Include "Enumerable" and define an "#each" method.

Including Enumerable gives a few things "for free" (like map), but also allows you to call #lazy on the class, which returns a LazyEnumerable, which you could then use for other methods.

Module: Enumerable (Ruby 2.7.0)

Hope that helps.

-James

--
  Matthew Kerwin
  https://matthew.kerwin.net.au/

Hi Adriel,

What's the data structure in ruby for Iterator?
What I meant is ,the Iterator is lazy and presents only one at each time.

The equivalent to a Scala / Java / GoF *Iterator* or a .NET
*Enumerator* in Ruby is the `Enumerator` class
(class Enumerator - RDoc Documentation).

An `Enumerator` yields values one-by-one.

Many methods that iterate over a collection will return an
`Enumerator` when you call them without a block, e.g.
`Enumerable#map`.

Note that while `Enumerator` mixes in `Enumerable` and thus provides
all the methods you are used to, all of these methods are *strict*
(i.e. they iterate over the whole `Enumerator`) and they return
`Array`s. So, they are not useful for infinite or very large
`Enumerator`s.

That's what `Enumerator::Lazy`
(class Enumerator::Lazy - RDoc Documentation) is for: it overrides
many of the methods in `Enumerable` with lazy versions. You can
construct an `Enumerator::Lazy` by calling `Enumerable#lazy` on any
`Enumerable` object, including non-lazy `Enumerator`s.

(This is the default data type for file IO in scala)
I want it to read very big files and streams etc.

In Ruby, you can iterate over large files and I/O streams using
`IO::foreach` (class IO - RDoc Documentation). I
mentioned above that many iteration methods conform to the protocol
that not passing a block means you want an `Enumerator`, and
`IO::foreach` is no different. If you want an `Enumerator` which
iterates over an I/O stream line-by-line, you can use:

io_iterator = IO.foreach(some_io_stream)

And if you want that iterator to be lazy, just add the `Enumerable#lazy` method:

lazy_io_iterator = IO.foreach(some_io_stream).lazy

Cheers!

···

Adriel Peng <peng.adriel@gmail.com> wrote:

Ruby generally uses internal iteration with Enumerable[1] but also
supports external iteration with Enumerator[2].

https://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/

# Open file for reading
f = File.open('/usr/share/dict/words', 'r')
=> #<File:/usr/share/dict/words>

# Get an enumerator that reads one line at a time
i = f.each
=> #<Enumerator: #<File:/usr/share/dict/words>:each>

# Iterate 3 times (lines)
puts i.take(3)
  A
  A's
  AMD
=> nil

# Iterate 3 more times (lines)
puts i.take(3)
  AMD's
  AOL
  AOL's
=> nil

# Iterate 3 more times (lines)
puts i.take(3)
  AWS
  AWS's
  Aachen
=> nil

# Iterate the rest of the file, passing a line at a time to the block
f.each{|l| print l.size }
  98106868810795768465781081091110120688106879810681012...
=> #<File:/usr/share/dict/words>

# Try to iterate 3 more times, but there are no more lines to be read
puts i.take(3)
=> nil

[1] Module: Enumerable (Ruby 3.1.0)
[2] Class: Enumerator (Ruby 3.1.0)
[3] Class: Enumerator::Lazy (Ruby 3.1.0)

···

On 1/29/22, Adriel Peng <peng.adriel@gmail.com> wrote:

> val fh = Source.fromFile(file).getLines()
> fh.foreach(println)
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
> fh.foreach(println)
>

Thanks Jorg W. This clarified my question.

Regards.

···

On Sat, Jan 29, 2022 at 11:18 PM Jörg W Mittag <ruby-talk@joergwmittag.de> wrote:

Adriel Peng <peng.adriel@gmail.com> wrote:

Hi Adriel,

> What's the data structure in ruby for Iterator?
> What I meant is ,the Iterator is lazy and presents only one at each time.

The equivalent to a Scala / Java / GoF *Iterator* or a .NET
*Enumerator* in Ruby is the `Enumerator` class
(class Enumerator - RDoc Documentation).

An `Enumerator` yields values one-by-one.

Many methods that iterate over a collection will return an
`Enumerator` when you call them without a block, e.g.
`Enumerable#map`.

Note that while `Enumerator` mixes in `Enumerable` and thus provides
all the methods you are used to, all of these methods are *strict*
(i.e. they iterate over the whole `Enumerator`) and they return
`Array`s. So, they are not useful for infinite or very large
`Enumerator`s.

That's what `Enumerator::Lazy`
(class Enumerator::Lazy - RDoc Documentation) is for: it overrides
many of the methods in `Enumerable` with lazy versions. You can
construct an `Enumerator::Lazy` by calling `Enumerable#lazy` on any
`Enumerable` object, including non-lazy `Enumerator`s.

> (This is the default data type for file IO in scala)
> I want it to read very big files and streams etc.

In Ruby, you can iterate over large files and I/O streams using
`IO::foreach` (class IO - RDoc Documentation). I
mentioned above that many iteration methods conform to the protocol
that not passing a block means you want an `Enumerator`, and
`IO::foreach` is no different. If you want an `Enumerator` which
iterates over an I/O stream line-by-line, you can use:

io_iterator = IO.foreach(some_io_stream)

And if you want that iterator to be lazy, just add the `Enumerable#lazy`
method:

lazy_io_iterator = IO.foreach(some_io_stream).lazy

Cheers!

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

The subject could have been expressed so beautifully! I started reading with passion!

29.01.2022 08:40 tarihinde Karthikeyan A K yazdı:

···

There are two data structures you can iterate. One is Array which is equivalent to list in Python, another is called Hash which is a key value pair which is equivalent to Dictionary in Python.

If you are beginner, you can learn about arrays here I Love Ruby: Get started with the greatest programming language made for humans. and about hashes here I Love Ruby: Get started with the greatest programming language made for humans.

Now coming to files, I am not sure if there a way to deal with big files, say file size kind of exceeding RAM size, if you are doing such things Hadoop's HDFS is the best way to go. But you can learn basics of files here I Love Ruby: Get started with the greatest programming language made for humans.

I should warn you that I am giving links from a Ruby book I wrote so its very biased, so you need to search the internet more to learn better.

- Karthikeyan A K
+91 8428050777

Sent with ProtonMail <https://protonmail.com/&gt; Secure Email.

------- Original Message -------
On Saturday, January 29th, 2022 at 10:02 AM, Adriel Peng > <peng.adriel@gmail.com> wrote:

What's the data structure in ruby for Iterator?
What I meant is ,the Iterator is lazy and presents only one at each time.
(This is the default data type for file IO in scala)
I want it to read very big files and streams etc.
Maybe I have misunderstood the use of ruby's data structure. Please point that out.

Thanks

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;