FileString - request for comments

Hi there

I just put FileString on github: http://github.com/apeiros/filestring
FileString is a class that wraps a path on the filesystem (a file) and provides an exact copy of the String API. This means you can code as if you had a String and your file on the disk gets manipulated just "magically".

The library is very young (just a bit more than 24h), so please use with care.

I'd appreciate any kind of comment.

Regards
Stefan

···

--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

Interesting choice to use a String. I used Tie::File a couple of times in Perl code. It works as an Array instead:

James Edward Gray II

···

On Nov 8, 2009, at 7:47 PM, apeiros@gmx.net wrote:

I just put FileString on github: GitHub - apeiros/filestring: Treat files like plain normal strings
FileString is a class that wraps a path on the filesystem (a file) and provides an exact copy of the String API. This means you can code as if you had a String and your file on the disk gets manipulated just "magically".

James Edward Gray II wrote:

···

On Nov 8, 2009, at 7:47 PM, apeiros@gmx.net wrote:

I just put FileString on github: GitHub - apeiros/filestring: Treat files like plain normal strings
FileString is a class that wraps a path on the filesystem (a file) and provides an exact copy of the String API. This means you can code as if you had a String and your file on the disk gets manipulated just "magically".

Interesting choice to use a String. I used Tie::File a couple of times in Perl code. It works as an Array instead:

Tie::File - Access the lines of a disk file via a Perl array - metacpan.org

James Edward Gray II

What would the advantage over mmap[1] be? FileString is pure ruby (right?) and hence more portable, but probably mmap is much more efficient? Any other tradeoffs?

[1] http://moulon.inra.fr/ruby/mmap.html; looks like this project of Guy Decoux's has been recently adopted by knu: http://github.com/knu/ruby-mmap\.

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

-------- Original-Nachricht --------

Datum: Mon, 9 Nov 2009 12:37:17 +0900
Von: James Edward Gray II <james@graysoftinc.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: FileString - request for comments

> I just put FileString on github: GitHub - apeiros/filestring: Treat files like plain normal strings
> FileString is a class that wraps a path on the filesystem (a file)
> and provides an exact copy of the String API. This means you can
> code as if you had a String and your file on the disk gets
> manipulated just "magically".

Interesting choice to use a String. I used Tie::File a couple of
times in Perl code. It works as an Array instead:

Tie::File - Access the lines of a disk file via a Perl array - metacpan.org

James Edward Gray II

Somebody I know already implemented a TieFile in ruby, the repository is at http://killerfox.protection-fault.ch/gitrepo/tie_file.git

Personally I don't tend to think of a file as an array. I'd use Tie::File if I'd need a persistent array, so the problem is coming "the other way round". With FileString I explicitly want to deal with a File, but not with an IO like API (of course you could go at it as "I need a persistent String" too - but that wasn't/isn't the case for me).

Regards
Stefan

···

On Nov 8, 2009, at 7:47 PM, apeiros@gmx.net wrote:

--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! Aktuelle Nachrichten aus Politik, Wirtschaft & Panorama | GMX

Hi Joel

What would the advantage over mmap[1] be? FileString is pure ruby
(right?) and hence more portable, but probably mmap is much more
efficient? Any other tradeoffs?

Interesting, I was looking if a solution existed already and didn't find mmap. Yes, FileString is pure ruby and should therefore run on all ruby implementations. And yes, I'd expect mmap to be more efficient on the other hand. It'd be interesting to combine the two (if that's at all possible).
In a quick test it seems FileString is more complete too, e.g. Mmap doesn't have #replace (should be trivial to add). But Mmap has the feature to only tie a part of the file.

http://github.com/knu/ruby-mmap\.

Thanks for the link

Regards
Stefan

···

--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter Aktuelle Nachrichten aus Politik, Wirtschaft & Panorama | GMX

It would probably be fairly trivial for you to directly support mmap at the OS level using Ruby/DL, Ruby-FFI or even syscall (although that's ugly and fragile). Take a look at some of my Plumber's Guide presentations at the link in my signature and also at http://kenai.com/projects/ruby-ffi for details of how to wrap these kinds of system calls such that they'll run identically on JRuby, Rubinius and MRI.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 Nov 2009, at 13:54, apeiros@gmx.net wrote:

Hi Joel

What would the advantage over mmap[1] be? FileString is pure ruby
(right?) and hence more portable, but probably mmap is much more
efficient? Any other tradeoffs?

Interesting, I was looking if a solution existed already and didn't find mmap. Yes, FileString is pure ruby and should therefore run on all ruby implementations. And yes, I'd expect mmap to be more efficient on the other hand. It'd be interesting to combine the two (if that's at all possible).
In a quick test it seems FileString is more complete too, e.g. Mmap doesn't have #replace (should be trivial to add). But Mmap has the feature to only tie a part of the file.

----
raise ArgumentError unless @reality.responds_to? :reason

I'd have looked for mmap first, knowing the concept from Linux. I'd also expect
that with mmap, you should be able to implement an efficient regex, though I'm
not sure how well gsub! would work, unless you can guarantee the match is
always exactly the length of the target string.

(And for gsub to be efficient, you'd need some fancy copy-on-write stuff, which
would make it that much more difficult to chain them.)

But if you were looking for comments, it looks awesome. Thanks!

···

On Monday 09 November 2009 06:54:15 am apeiros@gmx.net wrote:

Hi Joel

> What would the advantage over mmap[1] be? FileString is pure ruby
> (right?) and hence more portable, but probably mmap is much more
> efficient? Any other tradeoffs?

Interesting, I was looking if a solution existed already and didn't find
mmap. Yes, FileString is pure ruby and should therefore run on all ruby
implementations. And yes, I'd expect mmap to be more efficient on the other
hand.

I am still trying to wrap my head around the question whether hiding
file IO behind a String API is a good idea. Basically the reason to
create something like this is to be able to use a file in places which
expect to be given a String instance. However, code that uses String
assumes fast access to arbitrary portions of the string. When those
accesses are translated into random accesses to a file performance
_might_ suffer dramatically. Put differently: hiding the fact that we
are dealing with a file is convenient but may actually break your
neck. And although at a certain level of abstraction a file and a
String are pretty much the same (sequence of chars / bytes) it may
actually be a good thing to keep the API separate in order to treat
both appropriately. Stefan, what's your experience?

Kind regards

robert

···

2009/11/9 Eleanor McHugh <eleanor@games-with-brains.com>:

On 9 Nov 2009, at 13:54, apeiros@gmx.net wrote:

It would probably be fairly trivial for you to directly support mmap at the
OS level using Ruby/DL, Ruby-FFI or even syscall (although that's ugly and
fragile).

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert,

I am still trying to wrap my head around the question whether hiding
file IO behind a String API is a good idea.

As the PickAxe book points out, by having file i/o represented by a
String ... that is, making it irrelevant whether one is talking to a
String or a File ... makes for some nice unit testing.

-------- Original-Nachricht --------

Datum: Tue, 10 Nov 2009 00:28:56 +0900
Von: Robert Klemme <shortcutter@googlemail.com>
An: ruby-talk@ruby-lang.org
Betreff: Re: FileString - request for comments

I am still trying to wrap my head around the question whether hiding
file IO behind a String API is a good idea. Basically the reason to
create something like this is to be able to use a file in places which
expect to be given a String instance.

No. At least that was not the idea (though, you could).
The reason is that e.g. replacing a part of a file is cumbersome.
Compare:

# IO API:
File.open(path, "r+b") do |fh|
  fh.seek(offset+length)
  rest = fh.read
  fh.seek(offset)
  fh.write(replacement)
  fh.write(rest)
}

# String API:
fs = FileString.new(path)
fs[offset, length] = replacment # done!

Imagine how much more inconvenient it becomes when it's not offset & length but a Range, or when you have to accomodate negative offsets etc.

And there are other examples, just dive a bit in FileString's source :slight_smile:

The String API is *far* more convenient.

However, code that uses String
assumes fast access to arbitrary portions of the string. When those
accesses are translated into random accesses to a file performance
_might_ suffer dramatically.

Yes. If you get that kind of problem - you can always use File.read instead of FileString#to_s (or to_str).

Put differently: hiding the fact that we
are dealing with a file is convenient but may actually break your
neck.

As all highlevel things. If you don't know the things you're dealing with you can easily kill performance. Consider e.g. ary.any? { |obj| other.include?(obj) } - there, just accidentally created an O(n^2) algorithm. It can happen everywhere and it can look totally innocent.
That's not a problem that's specific to FileString but to everything that's abstract.

And although at a certain level of abstraction a file and a
String are pretty much the same (sequence of chars / bytes) it may
actually be a good thing to keep the API separate in order to treat
both appropriately. Stefan, what's your experience?

As you see, I disagree :slight_smile:
However, what you say is of course correct. Using FileString means you have to keep in mind that you're dealing with a file.
But: if you know you're dealing with a file, it can even help you making things faster. For example, if you indeed want to compare two files for equality, FileString#== will be faster and less memory intensive than you doing File.read(a) == File.read(b) if the two files are big.

Kind regards

robert

Thanks for your thoughts robert, much appreciated

regards
Stefan

···

--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

Using a given representation just because it's unit testing friendly isn't necessarily a good idea...

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

···

On 9 Nov 2009, at 16:49, Ralph Shnelvar wrote:

Robert,

> I am still trying to wrap my head around the question whether hiding
> file IO behind a String API is a good idea.

As the PickAxe book points out, by having file i/o represented by a
String ... that is, making it irrelevant whether one is talking to a
String or a File ... makes for some nice unit testing.

----
raise ArgumentError unless @reality.responds_to? :reason

-------- Original-Nachricht --------

Von: Robert Klemme <shortcutter@googlemail.com>

As all highlevel things. If you don't know the things you're dealing with you can easily kill performance. Consider e.g. ary.any? { |obj| other.include?(obj) } - there, just accidentally created an O(n^2) algorithm. It can happen everywhere and it can look totally innocent.
That's not a problem that's specific to FileString but to everything that's abstract.

True.

And although at a certain level of abstraction a file and a
String are pretty much the same (sequence of chars / bytes) it may
actually be a good thing to keep the API separate in order to treat
both appropriately. Stefan, what's your experience?

As you see, I disagree :slight_smile:

> However, what you say is of course correct. Using FileString means you > have to keep in mind that you're dealing with a file.
> But: if you know you're dealing with a file, it can even help you
> making things faster. For example, if you indeed want to compare two
> files for equality, FileString#== will be faster and less memory
> intensive than you doing File.read(a) == File.read(b) if the two files > are big.

A good point! You're probably right and I was too pessimistic. I'd love to see

fs[/foo(\w+)/, 1] = "bar"
fs.gsub! /foo/, "bar"

etc. because those would be the ones that would make FileString convenient for me. :slight_smile:

Thanks for your thoughts robert, much appreciated

Thanks for listening and sharing!

Kind regards

  robert

···

On 09.11.2009 17:29, apeiros@gmx.net wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Eleanor McHugh wrote:
[...]

Using a given representation just because it's unit testing friendly
isn't necessarily a good idea...

...or necessarily a bad idea. There's something to be said for letting
architecture emerge from testability.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason

Best,

···

--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org
--
Posted via http://www.ruby-forum.com/\.

-------- Original-Nachricht --------

Datum: Tue, 10 Nov 2009 06:25:08 +0900
Von: Robert Klemme <shortcutter@googlemail.com>
Betreff: Re: FileString - request for comments

A good point! You're probably right and I was too pessimistic. I'd
love to see

fs[/foo(\w+)/, 1] = "bar"
fs.gsub! /foo/, "bar"

etc. because those would be the ones that would make FileString
convenient for me. :slight_smile:

Those already exist. Unfortunately optimizing regex matching is too involved as that I could have done that in 24h :slight_smile:
Means fs[/foo(\w+)/, 1] = "bar" is just more convenient than writing:
data = File.read
data[/foo(\w+)/, 1] = "bar"
File.open(path, "w") { |fh| fh.write(data) }
But I think that's already quite worth it :slight_smile:
I mean - that's just lots of boilerplate.

Thanks for listening and sharing!

Always :smiley:
The listening part has made me change the docs btw., I know hint on thinking about performance and probably just use a string and write back when all is done.

Regards
Stefan

···

--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! Aktuelle Nachrichten aus Politik, Wirtschaft & Panorama | GMX

-------- Original-Nachricht --------

Datum: Tue, 10 Nov 2009 06:25:08 +0900
Von: Robert Klemme <shortcutter@googlemail.com>
Betreff: Re: FileString - request for comments

A good point! You're probably right and I was too pessimistic. I'd
love to see

fs[/foo(\w+)/, 1] = "bar"

I just noticed that I actually didn't have that functionality in. I added it now in the way described in the earlier reply.

Also a small correction of one of my earlier statements (typo):
You can use File.read or FileString#to_s (or to_str) instead of the FileString instance. FileString#to_s returns the contents of the file.

Regards
Stefan

···

--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! Aktuelle Nachrichten aus Politik, Wirtschaft & Panorama | GMX