[ANN] Rio 0.3.3

For your perusal -- Rio 0.3.3

== Overview

Rio is a Ruby I/O convenience class wrapping much of the functionality
of IO, File and Dir. Rio also uses Pathname, FileUtils, Tempfile,
StringIO, OpenURI, Zlib, and CSV to provide similar functionality using
a simple consistent interface. In addition to forwarding the interfaces
provided by IO, File, and Dir to an appropriate object, Rio provides
a "grande" interface that allows many common application-level I/O and
file-system tasks to be expressed succinctly.

== New for version 0.3.3
* Expanded support and documentation for CSV files
  Examples:
  * Copy, changing the separator to a semicolon
     rio('comma.csv').csv > rio('semicolon.csv').csv(';')
  * Iterate through a file with each line parsed into an array
     rio('afile.csv').csv { |array_of_fields| ...}
  * Create an array of arrays of selected fields
     array_of_arrays = rio('afile.csv').csv.columns(1..3,7).to_a
  * Create a tab separated file of accounts in a UNIX passwd file,
    listing only the username, uid, and realname fields
     rio('/etc/passwd').csv(':').columns(0,2,4) > rio('rpt').csv("\t")

Project:: http://rubyforge.org/projects/rio/
Documentation:: http://rio.rubyforge.org/
Bugs:: http://rubyforge.org/tracker/?group_id=821

rio4ruby wrote:

For your perusal -- Rio 0.3.3

== Overview

Rio is a Ruby I/O convenience class wrapping much of the functionality
of IO, File and Dir. Rio also uses Pathname, FileUtils, Tempfile,
StringIO, OpenURI, Zlib, and CSV to provide similar functionality using
a simple consistent interface. In addition to forwarding the interfaces
provided by IO, File, and Dir to an appropriate object, Rio provides
a "grande" interface that allows many common application-level I/O and
file-system tasks to be expressed succinctly.

== New for version 0.3.3
* Expanded support and documentation for CSV files
  Examples:
  * Copy, changing the separator to a semicolon
     rio('comma.csv').csv > rio('semicolon.csv').csv(';')
  * Iterate through a file with each line parsed into an array
     rio('afile.csv').csv { |array_of_fields| ...}
  * Create an array of arrays of selected fields
     array_of_arrays = rio('afile.csv').csv.columns(1..3,7).to_a
  * Create a tab separated file of accounts in a UNIX passwd file,
    listing only the username, uid, and realname fields
     rio('/etc/passwd').csv(':').columns(0,2,4) > rio('rpt').csv("\t")

This is great. Thanks for this incredibly useful lib.

Sascha Ebach

This is a _very_ nice library. Great docs too. I will use this heavily.
Thanks for the work-

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ezra@yakima-herald.com
509-577-7732

···

On Aug 19, 2005, at 6:16 PM, Sascha Ebach wrote:

rio4ruby wrote:

For your perusal -- Rio 0.3.3
== Overview
Rio is a Ruby I/O convenience class wrapping much of the functionality
of IO, File and Dir. Rio also uses Pathname, FileUtils, Tempfile,
StringIO, OpenURI, Zlib, and CSV to provide similar functionality using
a simple consistent interface. In addition to forwarding the interfaces
provided by IO, File, and Dir to an appropriate object, Rio provides
a "grande" interface that allows many common application-level I/O and
file-system tasks to be expressed succinctly.
== New for version 0.3.3
* Expanded support and documentation for CSV files
  Examples:
  * Copy, changing the separator to a semicolon
     rio('comma.csv').csv > rio('semicolon.csv').csv(';')
  * Iterate through a file with each line parsed into an array
     rio('afile.csv').csv { |array_of_fields| ...}
  * Create an array of arrays of selected fields
     array_of_arrays = rio('afile.csv').csv.columns(1..3,7).to_a
  * Create a tab separated file of accounts in a UNIX passwd file,
    listing only the username, uid, and realname fields
     rio('/etc/passwd').csv(':').columns(0,2,4) > rio('rpt').csv("\t")

This is great. Thanks for this incredibly useful lib.

Sascha Ebach

Hi

···

On 8/19/05, Ezra Zygmuntowicz <ezra@yakima-herald.com> wrote:

> >
> This is great. Thanks for this incredibly useful lib.
>
> Sascha Ebach
>

This is a _very_ nice library. Great docs too. I will use this heavily.
Thanks for the work-

Yes, very nice. When rio copies directories, how does it handle
soft links? Do they stay as links?

--
Jim Freeze

Dave Burt wrote:

I was going to ask for rio("cmdio:foo") and friends
rio("stdin:") etc., but I tried it and it works... just needs to be
documented in INTRO.

...

It would also be nice to have a more reader-friendly way to create Rios of
these different types. You could accept a Symbol as the first argument:
rio(:fd, 2)
rio(:string, my_string)
rio(:stdin)
rio(:stdio)
rio(:cmdio, "sort")

You're a damn genius, aren't you :slight_smile:

The reason 'stdin:', 'stdout:", etc. are not documented, is that
they are not currently part of the published interface. I also
noticed that if you move the colon to the begining and removed
the quotes, you have a symbol. I have not decided whether to have
the string form, the symbol form or both be part of the
documented interface.

The string form is not actually an alternative syntax.
It is core to Rio.
rio(?-)
is immediatly converted to
rio('stdio:')

Internally all resources are converted to a URL-like syntax,
a IORL if you like. The fully qualified IORL for stderr is
rio:stderr:
Rio, of course, assumes the 'rio:' part, if missing.

You may have noticed also, that if you drop the 'r', you have
io:stderr:
A readable, language neutral, way of addressing I/O within a
program. Similar to localhost in a URL, io:stdin: always refers
to stdin of the current process.

My ideas in this area are not fully hashed out -- hence the
lack of documentation.

Cheers,
-Christopher

Dave Burt wrote:

* Rename Rio#copy to Rio#copy_to

* Add aliases:
  * append_to >>
  * copy_from <
  * append_from <<

This change has been checked into CVS, and will be in the next release.

* Change + and =~ to use to_str instead of to_s.

This change has been checked into CVS, and will be in the next release.

* Add the following Pathname functionality:
  * mountpoint?
  * root?
  * realpath
  * cleanpath

These will be included, but I have not finished evalutating their
integration into Rio.

* Instead of the no* methods, you could add a wrapper ("except" or "skip"...
or both) which could be passed into methods like so:
io(foo).recurse(except(".svn")) {||...}.

When I was pondering a scheme for selection and rejection of files,
lines, etc. I decided from the start, that one thing I would
absolutely avoid is the introduction of a sub-language. I have seen
too many libraries do this only to become unusable to the casual
user. Rio's grande selection methods accept a Proc to deal with the
most general case and they accept every possible parameter type I
could dream up to handle most common cases. In my opinion, as readable
as 'except' is, it falls into the category of a sub-language.

Also, I there is a bug in io1.rel(io2) when io1 is absolute and io2 is
relative.

I will look into this. Thanks. A specific example would be helpful.

>> * each_filename (iterates over path components: "a/b/c" -> "a", "b",
>> "c")
> rio('a/b/c').filenames.each ?

That seems to fit the Rio way of doing things better. I assume that would
imply you can give a block straight to #filenames and it will pass it to
#each.

Rio#split returns an array of filenames, as described. This is not the
same as what you propose. Were #filenames a Rio configuration method,
it would take a block directly. This seems like a solid proposal. I
need to give this a little more thought, but at this point I am
inclined to dump #split in favor of #filenames.

>> * If Rio provided a superset of IO's functionality, you could use Rio
>> objects in place of IO objects, duck-type style. Same with File, Dir and
>> Pathname.
> The section "Using A Rio as an IO (or File or Dir)" in Rio::Doc::INTRO
> discusses this, with an illustration using a Rio with yaml.

I still think it would be fairly easy to have Rio able to replace IO, Dir,
File, Pathname in _all_ situations. Maybe this extra clutter would be best
placed in a mixin.

Ignoring the *big* problem with #<<, as described in RIO::Doc::INTRO,
Rio is close enough that it would be a shame not to allow a Rio to be
used anywhere an IO is. You have won me over on this one. Ditto File
and Dir

Off the top of my head, what would you think of an emulation mode
rio(...).likeio
or somesuch?
Describe how you see a mixin working.

Pathname is another matter. IO, File and Dir are integral parts of
Ruby. Pathname is not. I am not convinced that providing emulation
of legacy libraries such as Pathname should be a priority.

* Add pipes. The following should be possible:
io(foo) | io(bar) | io(baz)

I still love this idea. After revisiting it, I remembered why I had
put it aside before. Its behaviour is obvious in some cases ie. where
'foo' is a file, and 'bar' is a cmdio and 'baz' is a file, or if they
are all cmdios. It becomes murky to me what the behaviour should be
if, for instance, they are all files.

To generalize, for duplex streams, piping is clear, but for non-duplex
streams, what exactly does piping mean?

One possibility that comes to mind is making it simply a copy-to in
those cases. That would make possible the following:

# tee the output of 'acmd' to a file and to stdout.
rio(?-,'acmd') | rio('output_file') | rio(?-)

Another possibility, is to simply raise an error unless one of the
participants is a duplex-stream.

Another possibility, is that I don't understand what you are proposing.

This is too elegant to not be included in Rio. I need help defining
what its behaviour should be.

Cheers,
-Christopher

"rio4ruby" <rio4ruby@rubyforge.org> wrote in message
news:1125271272.418117.195830@g47g2000cwa.googlegroups.com...

···

Dave Burt wrote:

I was going to ask for rio("cmdio:foo") and friends
rio("stdin:") etc., but I tried it and it works... just needs to be
documented in INTRO.

...

It would also be nice to have a more reader-friendly way to create Rios
of
these different types. You could accept a Symbol as the first argument:
rio(:fd, 2)
rio(:string, my_string)
rio(:stdin)
rio(:stdio)
rio(:cmdio, "sort")

You're a damn genius, aren't you :slight_smile:

The reason 'stdin:', 'stdout:", etc. are not documented, is that
they are not currently part of the published interface. I also
noticed that if you move the colon to the begining and removed
the quotes, you have a symbol. I have not decided whether to have
the string form, the symbol form or both be part of the
documented interface.

The string form is not actually an alternative syntax.
It is core to Rio.
rio(?-)
is immediatly converted to
rio('stdio:')

Internally all resources are converted to a URL-like syntax,
a IORL if you like. The fully qualified IORL for stderr is
rio:stderr:
Rio, of course, assumes the 'rio:' part, if missing.

You may have noticed also, that if you drop the 'r', you have
io:stderr:
A readable, language neutral, way of addressing I/O within a
program. Similar to localhost in a URL, io:stdin: always refers
to stdin of the current process.

My ideas in this area are not fully hashed out -- hence the
lack of documentation.

Cheers,
-Christopher

(Sorry about the blank post prior to this)

Christopher wrote:

The reason 'stdin:', 'stdout:", etc. are not documented, is that
they are not currently part of the published interface. I also
noticed that if you move the colon to the begining and removed
the quotes, you have a symbol. I have not decided whether to have
the string form, the symbol form or both be part of the
documented interface.

I like both, and would like to have the option to use either. I prefer the
symbol, but having the string be a key part of the concept of a Rio object,
like a URL protocol, and showing up as such in #inspect, isn't a bad idea. I
don't like the character/Fixnum; the brevity is good, but a single-character
prefix in the string itself is shorter... maybe makes Rio less flexible,
though - rio("!sort") for a cmdio?

Cheers,
Dave

Christopher wrote:

When I was pondering a scheme for selection and rejection of files,
lines, etc. I decided from the start, that one thing I would
absolutely avoid is the introduction of a sub-language. I have seen
too many libraries do this only to become unusable to the casual
user. Rio's grande selection methods accept a Proc to deal with the
most general case and they accept every possible parameter type I
could dream up to handle most common cases. In my opinion, as readable
as 'except' is, it falls into the category of a sub-language.

Also, I there is a bug in io1.rel(io2) when io1 is absolute and io2 is
relative.

I will look into this. Thanks. A specific example would be helpful.

irb(main):002:0> rio('/usr/bin/ruby').rel('foo/bar')
=> #<Rio:0x1620220:"path:../../usr/bin/ruby" (Path::Reset)>

I'm not sure what the correct answer is; probably an exception would be
appropriate. Otherwise, if the parameter is relative, you might want to
evaluate it as an absolute path from the working directory.

Ignoring the *big* problem with #<<, as described in RIO::Doc::INTRO,
Rio is close enough that it would be a shame not to allow a Rio to be
used anywhere an IO is. You have won me over on this one. Ditto File
and Dir

Off the top of my head, what would you think of an emulation mode
rio(...).likeio
or somesuch?
Describe how you see a mixin working.

I was thinking of a mixin that mainly adds aliases and wrappers to
functionality that's already in Rio. It might override #<< to be
IO-compatible (probably retaining the Rio semantics for rio << rio). You
could mix it in like this:
r = rio(foo)
r.extend(Rio::IO)
some_method_which_wants_an_io(r)

You could add a #to_io method:
class Rio # I'm not sure if this is the right class to put it in - what is
Base?
  def to_io
    clone.extend(Rio::IO)
  end
end

I'm not sure what the ethics of having a to_x method that only returns a
duck-X, but if it does fully quack right, I don't mind.

Pathname is another matter. IO, File and Dir are integral parts of
Ruby. Pathname is not. I am not convinced that providing emulation
of legacy libraries such as Pathname should be a priority.

Fair enough. I don't get Pathname 100%, either (in particular, why is
Pathname#chdir deprecated?)

* Add pipes. The following should be possible:
io(foo) | io(bar) | io(baz)

I still love this idea. After revisiting it, I remembered why I had
put it aside before. Its behaviour is obvious in some cases ie. where
'foo' is a file, and 'bar' is a cmdio and 'baz' is a file, or if they
are all cmdios. It becomes murky to me what the behaviour should be
if, for instance, they are all files.

To generalize, for duplex streams, piping is clear, but for non-duplex
streams, what exactly does piping mean?

One possibility that comes to mind is making it simply a copy-to in
those cases. That would make possible the following:

# tee the output of 'acmd' to a file and to stdout.
rio(?-,'acmd') | rio('output_file') | rio(?-)

Another possibility, is to simply raise an error unless one of the
participants is a duplex-stream.

Initially I preferred the error, but the tee makes sense, even if it's a
little less than obvious. An input-only stream is still going to produce an
error, isn't it?

Another possibility, is that I don't understand what you are proposing.

This is too elegant to not be included in Rio. I need help defining
what its behaviour should be.

If you like it, and you understand *nix shell process-chaining pipes, then
you understand.

Cheers,
Dave

Dave Burt wrote:

Also, I there is a bug in io1.rel(io2) when io1 is absolute and io2 is
relative.

I will look into this. Thanks. A specific example would be helpful.

irb(main):002:0> rio('/usr/bin/ruby').rel('foo/bar')
=> #<Rio:0x1620220:"path:../../usr/bin/ruby" (Path::Reset)>

I'm not sure what the correct answer is; probably an exception would be
appropriate. Otherwise, if the parameter is relative, you might want to
evaluate it as an absolute path from the working directory.

Your example seems wrong, but is actually correct. If you ran
the code in the directory /baz.

Rio does, in fact, do exactly what you suggest -- it converts the
relative path foo/bar to /baz/foo/bar. Non-existant paths are assumed
to be files, and so the path to /usr/bin/ruby relative to the
directory /baz/foo is ../../usr/bin/ruby.

I do not necessarily defend this behaviour, but it is consistant with
the Ruby standard library class URI. Rio#rel uses URI#route_from to
calculate relative paths, and that is what the URI class does:

irb(main):004:0*
URI('http://localhost/usr/bin/ruby&#39;\).route_from('http://localhost/baz/foo/bar&#39;\)
=> #<URI::Generic:0x813057c URL:../../usr/bin/ruby>

Rio had been checking to see if foo/bar is a directory, and
if so produce ../../../usr/bin/ruby. Somewhere along the
way that got lost. This is a bug.

Non-existant paths can be forced to be treated as directories by ending

them with a slash.

irb(main):007:0> rio('/usr/bin/ruby').rel('foo/bar/')
=> #<Rio:0x829db3c:"path:../../../usr/bin/ruby" (Path::Reset)>

Again this is done, because this is what URI does.

irb(main):005:0>
URI('http://localhost/usr/bin/ruby&#39;\).route_from('http://localhost/baz/foo/bar/&#39;\)
=> #<URI::Generic:0x812dfa8 URL:../../../usr/bin/ruby>

I am open to suggestions in this area.

Cheers,
-Christopher

Dave Burt wrote:

I like both, and would like to have the option to use either.

You now have both -- and more. You now can create pathless Rios in
the following ways (using strio as an example)
  str = "Hello World\n"

  rio(?",str)
  rio(:strio,str)
  rio("strio:",str)
  rio.strio(str)
  RIO.strio(str)
  RIO::Rio.strio(str)
  RIO::Rio.new(char_or_symbol_or_string_as above,str)
  RIO::rio(char_or_symbol_or_string_as above,str)
  RIO::Rio.rio(char_or_symbol_or_string_as above,str)

I most likely will drop some of these in the next release, based on the

feedback I get.

I prefer the
symbol, but having the string be a key part of the concept of a Rio object,
like a URL protocol, and showing up as such in #inspect, isn't a bad idea.

In each of the following the part in quotes after the second colon
is the IORL.

irb(main):005:0> rio(?=)
=> #<Rio:0x81334f8:"stderr:" (Stream::Open)>
irb(main):006:0> rio(?#,2)
=> #<Rio:0x81299d0:"fd:2" (Stream::Open)>
irb(main):007:0> rio($stderr)
=> #<Rio:0x812346c:"sysio:0x08076f1c" (Stream::Open)>
irb(main):008:0> rio('foo/bar')
=> #<Rio:0x811d898:"path:foo/bar" (Path::Reset)>
irb(main):009:0> rio('/tmp')
=> #<Rio:0x8126088:"file:///tmp" (Path::Reset)>
irb(main):010:0> rio('http://localhost/&#39;\)
=> #<Rio:0x81234c0:"http://localhost/&quot; (HTTP::Stream::Open)>
irb(main):011:0> rio(??,'zippy','/tmp')
=> #<Rio:0x82a5fa0:"temp:/tmp/zippy" (Temp::Reset)>

I don't like the character/Fixnum; the brevity is good, but a single-character
prefix in the string itself is shorter... maybe makes Rio less flexible,
though - rio("!sort") for a cmdio?

Originally I was using single character strings like rio('-') for
stdio:,
as the super brief version -- which is longer than rio(?-).
At the time I felt that the single character was cleaner (is that
stdio,
or a file named '-').

The embedded character works in some places, and certainly has
precedant,
but what do you propose for strio, stdio and the others?

I would like to have this issue decided by the next release.