Is there a better string.each?

George_Ogata1 · 8 July 2002 11:43

Hi,

Yukihiro Matsumoto wrote:

Hi,

the only point i am trying to make is that in so far as both account for
order and list, i want their exposed methodologies to be the same.
that’s it. (whihc is why .each should work differently)

a string is not an array.

but still, a string can be seen as an array of characters
sometimes, hence str[0] returns the first character (a byte in the
current implementation) in the string.

text processing model of Ruby is besed on lines, not characters.
That’s why I made “each” to be line oriented, not character
oriented. I took usefulness over consisitency here.

So, if you want String consistent with Array, you need to express
either:

why it is important over usefulness

or line-oriented “each” is not usefull at all

good enough to bring incompatibility.

matz.

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of characters,
and at other times, as arrays of lines (and at yet other times, as arrays
of words or paragraphs).

Since there doesn’t seem to be a universally happy medium, perhaps the
problem lies in assuming there is one. I think the method “each” is the
problem: it’s ambiguous (“each what?”).

Couldn’t we have separate methods (as suggested earlier) #bytes, #chars (or
#characters), #words (maybe), #lines, #pars (or #paragraphs) for String?
We could have identical methods for the IO class, so IOs and Strings can be
used interchangeably with these methods. These methods could be both used
to iterate over the object (when called with a block), or used to retrieve
an Array (not array!) of the elements we’re interested in (when called
without a block). E.g.:

s = “abc\ndef\nghi”
s.lines >> [“abc”, “def”, “ghi”]
a =
s.lines do {|s| a << (s + ‘xyz’)} >> nil (or maybe something else)
s >> [“abcxyz”, “defxyz”, “ghixyz”]

Thus, io.lines &proc has the same effect as io.lines.each &proc (though
would hopefully require less memory when say, reading in a large file,
since the former doesn’t need to read the whole thing into a gigantic Array
first).

Now, what about #each? Well, we’ve made it essentially useless for calling
directly (since the others are more readable and unambiguous (agree?)).
But there’s still the interaction of String and IO with the Enumerable
mixin. What about this?:

We have methods corresponding to the iterator/collector methods to set the
one that #each points to: #useBytes, #useChars (or #useCharacters),
#useWords, #useLines, #usePars (or #useParagraphs).

These could be defined in their own mixin module:

Module StringProcessing
def useBytes
class << self
alias each bytes
end
end

def useChars
class << self
alias each chars
end
end

def useWords
class << self
alias each words
end
end

def useLines
class << self
alias each lines
end
end

def usePars
class << self
alias each pars
end
end
end

These would change the behaviour of #each for the String or IO instance
they’re called. E.g.:

s = “abc\ndef\nghi”
s.useChars
s.collect {|char| frob char} # iterates over chars
s.useLines
s.collect {|line| frob line} # now it iterates over lines

The default behaviour would be to iterate over lines, so as to be backward
compatible. Well, except for my little, implicit wish that the record
separators would be removed automatically. E.g., I’d rather
“abc\ndef”.lines would return [“abc”, “def”] than [“abc\n”, “def”]. But I
guess that’s another war… Still, even if the record separators stay,
this’d be a pretty flexible String/IO model, wouldn’t it?

Thoughts?

···

In message “Re: is there a better string.each?” > on 02/07/08, Tom Sawyer transami@transami.net writes:

David_Alan_Black1 · 8 July 2002 12:06

Hello –

Couldn’t we have separate methods (as suggested earlier) #bytes, #chars (or
#characters), #words (maybe), #lines, #pars (or #paragraphs) for String?
We could have identical methods for the IO class, so IOs and Strings can be
used interchangeably with these methods. These methods could be both used
to iterate over the object (when called with a block), or used to retrieve
an Array (not array!) of the elements we’re interested in (when called
without a block). E.g.:

s = “abc\ndef\nghi”
s.lines >> [“abc”, “def”, “ghi”]
a =
s.lines do {|s| a << (s + ‘xyz’)} >> nil (or maybe something else)
s >> [“abcxyz”, “defxyz”, “ghixyz”]

Thus, io.lines &proc has the same effect as io.lines.each &proc (though
would hopefully require less memory when say, reading in a large file,
since the former doesn’t need to read the whole thing into a gigantic Array
first).

Now, what about #each? Well, we’ve made it essentially useless for calling
directly (since the others are more readable and unambiguous (agree?)).
But there’s still the interaction of String and IO with the Enumerable
mixin. What about this?:

We have methods corresponding to the iterator/collector methods to set the
one that #each points to: #useBytes, #useChars (or #useCharacters),
#useWords, #useLines, #usePars (or #useParagraphs).

These could be defined in their own mixin module:

Module StringProcessing
def useBytes
class << self
alias each bytes
end
end

[same for Chars, Lines, etc]

These would change the behaviour of #each for the String or IO instance
they’re called. E.g.:

s = “abc\ndef\nghi”
s.useChars
s.collect {|char| frob char} # iterates over chars
s.useLines
s.collect {|line| frob line} # now it iterates over lines

Personally I’m much happier doing:

s.split(“”).collect {|char| … }

Maybe it’s the camelCase that’s putting me off but I think it’s
more just that this seems like a new, parallel-universe way of doing
things Ruby already does with record separators and #split parameters
and so forth. (I’m not saying there isn’t room to debate how those
things work… just that the above examples seem stylistically and
design-ly very different.)

The default behaviour would be to iterate over lines, so as to be backward
compatible. Well, except for my little, implicit wish that the record
separators would be removed automatically. E.g., I’d rather
“abc\ndef”.lines would return [“abc”, “def”] than [“abc\n”, “def”]. But I
guess that’s another war… Still, even if the record separators stay,
this’d be a pretty flexible String/IO model, wouldn’t it?

Thoughts?

My main thought is that I wish I could get Ruby Behaviors to work
fully (http://www.superlink.net/~dblack/ruby/behaviors) I remain
convinced that Ruby could/can/should lend itself to tremendous
flexibility and even-handedness in many matters, without changes to
the core language. The Behaviors package as it stands doesn’t fully
let the genie out of the bottle… but maybe that will come. (Anyone
seen David Simmons lately?

David

···

On Mon, 8 Jul 2002, George Ogata wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Jim_Weirich2 · 8 July 2002 12:24

We have methods corresponding to the iterator/collector methods to set the
one that #each points to: #useBytes, #useChars (or #useCharacters),
#useWords, #useLines, #usePars (or #useParagraphs).
[…]
s = “abc\ndef\nghi”
s.useChars
s.collect {|char| frob char} # iterates over chars
s.useLines
s.collect {|line| frob line} # now it iterates over lines

The only problem I have with this suggestions is that
useChars/useLines/etc are state changing methods that affect the
behavior of “each” later in the program. Using “useChars” at one
point may break a portion of code that assumes a line by line version
of each.

How about something like this.

s = "abc\ndef\nghi"
s.by_chars.collect { |char| frob char }
s.by_lines.collect { |line| frob line }

The methods by_chars/by_lines return an appropriate adapter object
that does the indicated type of iteration over the target string.
This means that there is no state information in a string that needs
to change. The original s.each can continue to work as currently
defined to avoid breakage.

···

–
– Jim Weirich jweirich@one.net http://w3.one.net/~jweirich

“Beware of bugs in the above code; I have only proved it correct,
not tried it.” – Donald Knuth (in a memo to Peter van Emde Boas)

Tom_Sawyer · 8 July 2002 12:37

in response to George Ogata.

great idea, IMHO. i was starting to have similar notions, to a lesser
extent. i especially like how you bring IO into the fold. i think one of
the things that is becoming clear in this discussion, is the desire to
have as much of a consisitent interface as possible across all classes
in so far as they are alike. currently there are a number of
inconsistencies that are begging to be improved: similarities between
String and Array, similariteis between String and IO, etc. so it would
be nice to see those changes in the future. at first this seemed like a
very difficult thing to do without breaking lots of code. but
massilliano had a great notion for making it easy to preseve backward
compatability. (assuming that is reasonably doable). so ther is hope
that these improvements can be made --and matz adored child can become
an ever more beautful creation.

question: how do you determine paragraph seperation?

i think the #useX is a great notion. but we may be able to make it more
general. we already have the $/ record seperator. but i do not care for
it myeself --too perlish. and even so it dosen’t support character
seperation straigtfowardly. so what if we had a more general
#seperator=(x)? i would very much prefer something like this over $/=x.
but also x=“” would mean each character, not “\n” to which it currently
defaults.

~transami

···

Hi,

Yukihiro Matsumoto wrote:

Hi,

In message “Re: is there a better string.each?” > > on 02/07/08, Tom Sawyer transami@transami.net writes:

the only point i am trying to make is that in so far as both account for
order and list, i want their exposed methodologies to be the same.
that’s it. (whihc is why .each should work differently)

a string is not an array.

but still, a string can be seen as an array of characters
sometimes, hence str[0] returns the first character (a byte in the
current implementation) in the string.

text processing model of Ruby is besed on lines, not characters.
That’s why I made “each” to be line oriented, not character
oriented. I took usefulness over consisitency here.

So, if you want String consistent with Array, you need to express
either:

why it is important over usefulness

or line-oriented “each” is not usefull at all

good enough to bring incompatibility.

matz.

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of characters,
and at other times, as arrays of lines (and at yet other times, as arrays
of words or paragraphs).

Since there doesn’t seem to be a universally happy medium, perhaps the
problem lies in assuming there is one. I think the method “each” is the
problem: it’s ambiguous (“each what?”).

Couldn’t we have separate methods (as suggested earlier) #bytes, #chars (or
#characters), #words (maybe), #lines, #pars (or #paragraphs) for String?
We could have identical methods for the IO class, so IOs and Strings can be
used interchangeably with these methods. These methods could be both used
to iterate over the object (when called with a block), or used to retrieve
an Array (not array!) of the elements we’re interested in (when called
without a block). E.g.:

s = “abc\ndef\nghi”
s.lines >> [“abc”, “def”, “ghi”]
a =
s.lines do {|s| a << (s + ‘xyz’)} >> nil (or maybe something else)
s >> [“abcxyz”, “defxyz”, “ghixyz”]

Thus, io.lines &proc has the same effect as io.lines.each &proc (though
would hopefully require less memory when say, reading in a large file,
since the former doesn’t need to read the whole thing into a gigantic Array
first).

Now, what about #each? Well, we’ve made it essentially useless for calling
directly (since the others are more readable and unambiguous (agree?)).
But there’s still the interaction of String and IO with the Enumerable
mixin. What about this?:

We have methods corresponding to the iterator/collector methods to set the
one that #each points to: #useBytes, #useChars (or #useCharacters),
#useWords, #useLines, #usePars (or #useParagraphs).

These could be defined in their own mixin module:

Module StringProcessing
def useBytes
class << self
alias each bytes
end
end

def useChars
class << self
alias each chars
end
end

def useWords
class << self
alias each words
end
end

def useLines
class << self
alias each lines
end
end

def usePars
class << self
alias each pars
end
end
end

These would change the behaviour of #each for the String or IO instance
they’re called. E.g.:

s = “abc\ndef\nghi”
s.useChars
s.collect {|char| frob char} # iterates over chars
s.useLines
s.collect {|line| frob line} # now it iterates over lines

The default behaviour would be to iterate over lines, so as to be backward
compatible. Well, except for my little, implicit wish that the record
separators would be removed automatically. E.g., I’d rather
“abc\ndef”.lines would return [“abc”, “def”] than [“abc\n”, “def”]. But I
guess that’s another war… Still, even if the record separators stay,
this’d be a pretty flexible String/IO model, wouldn’t it?

Thoughts?

–
~transami

“They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.”
– Benjamin Franklin

Austin_Ziegler2 · 8 July 2002 13:13

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of
characters, and at other times, as arrays of lines (and at yet
other times, as arrays of words or paragraphs).

Since there doesn’t seem to be a universally happy medium, perhaps
the problem lies in assuming there is one. I think the method
“each” is the problem: it’s ambiguous (“each what?”).

I don’t actually find it ambiguous, although I think that it’s
poorly documented. String#each is said to be the same as
String#each_line, but I would argue that it isn’t. String#each_line
implies \n in all cases (or /\n+/m), but the way that it’s
implemented is actually the same as the (not-implemented)
String#each_record. The default for String#each is to treat \n as
the record separator.

If $/ (the argument to String#each) could be made to accept regex,
then the record separator could be String#each(//) if necessary,
allowing character-by-character parsing. This is not currently
possible.

Excepting String#each_byte, all of the other conditions are simply a
matter of choosing a different record separator. (Words could be
/\s/m; paragraphs could be /\n\n/m.)

Strings, by and large, aren’t better seen as arrays or Arrays of
anything – but there are times when it is useful to see them so.
What’s obviously differing is the view on when those times are. As
an application developer, I can see few reasons for dealing with
strings as arrays of characters or bytes – I am more interested in
dealing with whole strings or substrings. Library developers are
more likely to be interested in dealing with strings as arrays.

I don’t think that your suggestion (String#bytes, String#chars,
String#words, etc.) is applicable to most cases – and for string
entities greater than #chars, likely to do it The Wrong Way.

Now, what about #each? Well, we’ve made it essentially useless for
calling directly (since the others are more readable and
unambiguous (agree?)).

Disagree. They are neither more readable nor unambiguous. The
definition of a Word, Paragraph, etc. are too varied to make a
single decision on.

I think that the mistake here is that String#each is equated with
String#each_line, but this isn’t the case – and when you look at
the documentation, it is clear that it really isn’t the case.
String#each, as I said, is REALLY String#each_record – and IMO
that’s a good thing for it to be.

I don’t much care for the mixins you’ve suggested, either.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.07.08 at 08.59.35

···

On Mon, 8 Jul 2002 20:43:57 +0900, George Ogata wrote:

Albert2 · 8 July 2002 13:53

I think that this thread reveals a serious confusion about the difference
between data and containers for data. A string is data, plain and simple.
Data can be viewed/processed in a variety of ways. There is no canonical
model for a string. Why not just subclass it to do what you like? Why
change the language? Use the language, Luke.

class Chapters < String
:
class Paragraphs < String
:
class Lines < String
:
class Words < String
:
class Fields < String
:
class Chars < String
:
class Stream < String
:
etc. ad nauseum, ad infinitum

Ned_Konz · 8 July 2002 14:57

I think some of the discussion about String is confusing Strings with
higher-level concepts.

To me, Strings just have characters. They are akin to (say) ByteArrays
in Smalltalk, which just contain bytes.

Imposing additional structure on top of this (lines, words,
paragraphs, etc.) should be the job of more special-purpose
constructs.

I would argue that structure in text should be handled by (say) Text
objects of some sort, rather than Strings. Objects of the Text type
would also know about context.

···

On Monday 08 July 2002 04:43 am, George Ogata wrote:

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of
characters, and at other times, as arrays of lines (and at yet
other times, as arrays of words or paragraphs).

–
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

Yukihiro_Matsumoto2 · 8 July 2002 14:57

Hi,

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of characters,
and at other times, as arrays of lines (and at yet other times, as arrays
of words or paragraphs).

Agreed.

Since there doesn’t seem to be a universally happy medium, perhaps the
problem lies in assuming there is one. I think the method “each” is the
problem: it’s ambiguous (“each what?”).

If the name “each” is the problem, it is the problem that costs just a
few bits in your brain, so it’s virtually nothing at all.

Couldn’t we have separate methods (as suggested earlier) #bytes, #chars (or
#characters), #words (maybe), #lines, #pars (or #paragraphs) for String?

Possible.

We have methods corresponding to the iterator/collector methods to set the
one that #each points to: #useBytes, #useChars (or #useCharacters),
#useWords, #useLines, #usePars (or #useParagraphs).

I believe introducing another state to strings is a bad idea.
Instead, parameterized mix-in (e.g. include Enumerable using each_byte
as each) is a interesting idea, but it should be pretty hard to
implement.

						matz.

···

In message “Re: is there a better string.each?” on 02/07/08, George Ogata g_ogata@optushome.com.au writes:

Ned_Konz · 8 July 2002 14:58

There is no easy definition for “words” or “paragraphs” that works
across languages and contexts.

···

On Monday 08 July 2002 04:43 am, George Ogata wrote:

Couldn’t we have separate methods (as suggested earlier) #bytes,
#chars (or #characters), #words (maybe), #lines, #pars (or
#paragraphs) for String?

–
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

Marcin_Qrczak_Kowalc · 8 July 2002 16:50

Mon, 8 Jul 2002 22:53:16 +0900, Albert Wagner alwagner@tcac.net pisze:

Why not just subclass it to do what you like? Why change the
language? Use the language, Luke.

class Chapters < String
:
class Paragraphs < String
:

Because they aren’t differences between the data, but differences
between ways of processing of a string. It doesn’t make much sense
for me to choose the iteration method when creating the string.

···

–
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
__/
^^
QRCZAK

Olonichev_Sergei · 8 July 2002 15:14

Hello,

I have compiled Berkeley DB 3.3.11 and bdb-0.3.1 under cygwin.
But “make test” does not work well. Have anybody tested bdb under cygwin?

The output is cut off:

$ tests/recno.rb

VERSION of BDB is Sleepycat Software: Berkeley DB 3.3.11: (July 12, 2001)

TestRecno#test_00_error .
TestRecno#test_01_init .
…

Time: 0.361
FAILURES!!!
Test Results:
Run: 24/24(141 asserts) Failures: 4 Errors: 5
Failures: 4
tests/recno.rb:211:in `test_13_txn_commit’(TestRecno):
expected:<3

but was:<6> (RUNIT::AssertionFailedError)
from tests/recno.rb:408
tests/recno.rb:221:in test_14_txn_abort'(TestRecno): <size in txn> expected:<6> but was:<8> (RUNIT::AssertionFailedError) from tests/recno.rb:408 tests/recno.rb:233:in test_15_txn_abort2’(TestRecno):
expected:<5
but was:<8> (RUNIT::AssertionFailedError)
from tests/recno.rb:230:in begin' from tests/recno.rb:230:in catch’
from tests/recno.rb:230:in begin' from tests/recno.rb:230:in test_15_txn_abort2’
from tests/recno.rb:408
tests/recno.rb:252:in test_16_txn_commit2'(TestRecno): <size in txn> expected:< 5> but was:<8> (RUNIT::AssertionFailedError) from tests/recno.rb:249:in begin’
from tests/recno.rb:249:in catch' from tests/recno.rb:249:in begin’
from tests/recno.rb:249:in test_16_txn_commit2' from tests/recno.rb:408 Errors: 5 tests/recno.rb:203:in open’(TestRecno): Permission denied (BDB::Fatal)
from tests/recno.rb:203:in test_12_env' from tests/recno.rb:408 tests/recno.rb:271:in open’(TestRecno): Permission denied (BDB::Fatal)
from tests/recno.rb:271:in `test_17_file’
from tests/recno.rb:408

Sergei

Olonichev_Sergei · 8 July 2002 15:16

Hello,

I have compiled Berkeley DB 3.3.11 and bdb-0.3.1 under cygwin.
But “make test” does not work well. Have anybody tested bdb under cygwin?

The output is cut off:

$ tests/recno.rb

VERSION of BDB is Sleepycat Software: Berkeley DB 3.3.11: (July 12, 2001)

TestRecno#test_00_error .
TestRecno#test_01_init .
…

Time: 0.361
FAILURES!!!
Test Results:
Run: 24/24(141 asserts) Failures: 4 Errors: 5
Failures: 4
tests/recno.rb:211:in `test_13_txn_commit’(TestRecno):
expected:<3

but was:<6> (RUNIT::AssertionFailedError)
from tests/recno.rb:408
tests/recno.rb:221:in test_14_txn_abort'(TestRecno): <size in txn> expected:<6> but was:<8> (RUNIT::AssertionFailedError) from tests/recno.rb:408 tests/recno.rb:233:in test_15_txn_abort2’(TestRecno):
expected:<5
but was:<8> (RUNIT::AssertionFailedError)
from tests/recno.rb:230:in begin' from tests/recno.rb:230:in catch’
from tests/recno.rb:230:in begin' from tests/recno.rb:230:in test_15_txn_abort2’
from tests/recno.rb:408
tests/recno.rb:252:in test_16_txn_commit2'(TestRecno): <size in txn> expected:< 5> but was:<8> (RUNIT::AssertionFailedError) from tests/recno.rb:249:in begin’
from tests/recno.rb:249:in catch' from tests/recno.rb:249:in begin’
from tests/recno.rb:249:in test_16_txn_commit2' from tests/recno.rb:408 Errors: 5 tests/recno.rb:203:in open’(TestRecno): Permission denied (BDB::Fatal)
from tests/recno.rb:203:in test_12_env' from tests/recno.rb:408 tests/recno.rb:271:in open’(TestRecno): Permission denied (BDB::Fatal)
from tests/recno.rb:271:in `test_17_file’
from tests/recno.rb:408

Sergei

Mark_Probert2 · 8 July 2002 16:10

Hi.

This is a Ruby-DBI question. I am using 0.0.15 under ruby 1.6.6 (cygwin).
I am using ODBC to connect to a SQL 7 database. The connection and
test programs work fine.

However, when I do:

require ‘dbi’

dbh = DBI.connect(‘DBI:ODBC:pubs’, ‘bob’, ‘bob’)
state = "CA"
sth = dbh.prepare(“select au_lname, au_fname from authors where state = #{state}”)
sth.execute

I get:

/usr/local/lib/ruby/site_ruby/1.6/DBD/ODBC/ODBC.rb:129:in prepare': S0022 (207) [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid column name 'CA'. (DBI::DatabaseError) from /usr/local/lib/ruby/site_ruby/1.6/dbi/dbi.rb:536:inprepare’
from select_1.rb:8

Is there a problem in using where clauses in DBI with ODBC? Or am I
doing something wrong?

TIA,

-mark.

···

Mark Probert (probertm@nortelnnetworks.com)
Nortel Networks – Optera Metro 3000 GNPS
Phone: (613) 768-1082 [ESN: 398-1082]

Tobi_Reif · 8 July 2002 18:49

Tom Sawyer wrote:

question: how do you determine paragraph seperation?

To delimit paragraphs, sections, headers, footers, etc, one uses
specific formats and tools.

Yes you can have some tool guess what your idea of a paragraph in a
plain text file is, but this will not get you very far; it will be very
simple (text2html style). If you provide lots of features, then parsing
gets difficult.
There is no way to correctly guess the structure of arbitrary text
files; you’d have to develop a format that authors use; but you’re much
better off using popular notational systems such as XML, and you might
want to evaluate document markup languages such as XHTML and DocBook.
Then you can choose from a myriad of available tools, and there will be
no guesswork.

Tobi

···

–
http://www.pinkjuice.com/

Albert2 · 8 July 2002 19:02

I don’t understand. Why would you ever have to choose the interation method
when creating a string?

···

On Monday 08 July 2002 11:50 am, Marcin ‘Qrczak’ Kowalczyk wrote:

Mon, 8 Jul 2002 22:53:16 +0900, Albert Wagner alwagner@tcac.net pisze:

Why not just subclass it to do what you like? Why change the
language? Use the language, Luke.

class Chapters < String

class Paragraphs < String

Because they aren’t differences between the data, but differences
between ways of processing of a string. It doesn’t make much sense
for me to choose the iteration method when creating the string.

Tobi_Reif · 8 July 2002 19:09

Marcin ‘Qrczak’ Kowalczyk wrote:

Mon, 8 Jul 2002 22:53:16 +0900, Albert Wagner alwagner@tcac.net pisze:

Why not just subclass it to do what you like? Why change the
language? Use the language, Luke.

class Chapters < String
:
class Paragraphs < String
:

Because they aren’t differences between the data, but differences
between ways of processing of a string. It doesn’t make much sense
for me to choose the iteration method when creating the string.

absolutely.

Perhaps

class Document < String
class Chapter < String
class Section < String
class Para < String

p doc

=> [document,[[section,[para,para]], [section[para,para,para]]]]

Here’s a prototype for s.th similiar I did a while ago:

···

structured abstract model representation for the lib

utils: # # # # # #

class String
def lines
split “\n”
end
end
class Array
def from_2
self[1…-1]
end
end

class Lib < Array
def format
map do |list|
list.from_2.map do |item|
item.join(“\n”)
end
end.join(“\n\n”)
end
end

lib = Lib.new
readme = '#!usr/local/bin/ruby

this file was automatically generated; and changes can get overwritten’

lib.push [‘readme’,readme.lines]

lib.push [‘modules’],[‘classes’]

class_ellipse = ‘class Ellipse
def cx
5
end
def cx=
7
end
end’

class_rect = ‘class Rect
def x
5
end
def x=
7
end
end’

module_id = ‘module Id
def id
9
end
def id=
3
end
end’

[class_rect,class_ellipse].each do |class_code|
lib.assoc(‘classes’).push class_code.lines
end

#print all classes
#print lib.assoc(‘classes’).from_2.join “\n”

print class Ellipse

#print lib.assoc(‘classes’).assoc(‘class Ellipse’).join(“\n”)

lib.assoc(‘modules’).push module_id.lines

prints out nicely

open(‘generated_lib.tst’,‘w’) do |file|
file.write lib.format
end

Tobi

–
http://www.pinkjuice.com/

Tom_Sawyer · 8 July 2002 19:19

austin, i think you are getting to the core of what’s really wanted from
String#each that it now lacks, and why this has come up at all.

since $/ can’t be a regexp, a shorthand was desired to append multiple
successive \n’s together in the #each method and unfortunetly ‘’ got the
job. but that left us with no equivalent to the fictional #each_chr,
thus we have typically resorted to using #each_byte and converting the
bytecode back to a chr. the end result is this implementation of #each
is simply a lack of polish (POLiSh?) $/ needs to accept regexps, as you
say, and ‘’, as a record seperator, needs to indicate each character.
these two changes would clear this all up.

~transami

···

On Mon, 2002-07-08 at 07:13, Austin Ziegler wrote:

On Mon, 8 Jul 2002 20:43:57 +0900, George Ogata wrote:

Well, “usefulness” depends on the application. And it seems that
sometimes, strings are better seen as arrays (not Arrays!) of
characters, and at other times, as arrays of lines (and at yet
other times, as arrays of words or paragraphs).

Since there doesn’t seem to be a universally happy medium, perhaps
the problem lies in assuming there is one. I think the method
“each” is the problem: it’s ambiguous (“each what?”).

I don’t actually find it ambiguous, although I think that it’s
poorly documented. String#each is said to be the same as
String#each_line, but I would argue that it isn’t. String#each_line
implies \n in all cases (or /\n+/m), but the way that it’s
implemented is actually the same as the (not-implemented)
String#each_record. The default for String#each is to treat \n as
the record separator.

If $/ (the argument to String#each) could be made to accept regex,
then the record separator could be String#each(//) if necessary,
allowing character-by-character parsing. This is not currently
possible.

Excepting String#each_byte, all of the other conditions are simply a
matter of choosing a different record separator. (Words could be
/\s/m; paragraphs could be /\n\n/m.)

Strings, by and large, aren’t better seen as arrays or Arrays of
anything – but there are times when it is useful to see them so.
What’s obviously differing is the view on when those times are. As
an application developer, I can see few reasons for dealing with
strings as arrays of characters or bytes – I am more interested in
dealing with whole strings or substrings. Library developers are
more likely to be interested in dealing with strings as arrays.

I don’t think that your suggestion (String#bytes, String#chars,
String#words, etc.) is applicable to most cases – and for string
entities greater than #chars, likely to do it The Wrong Way.

Now, what about #each? Well, we’ve made it essentially useless for
calling directly (since the others are more readable and
unambiguous (agree?)).

Disagree. They are neither more readable nor unambiguous. The
definition of a Word, Paragraph, etc. are too varied to make a
single decision on.

I think that the mistake here is that String#each is equated with
String#each_line, but this isn’t the case – and when you look at
the documentation, it is clear that it really isn’t the case.
String#each, as I said, is REALLY String#each_record – and IMO
that’s a good thing for it to be.

I don’t much care for the mixins you’ve suggested, either.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.07.08 at 08.59.35

–
~transami

“They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.”
– Benjamin Franklin

David_Alan_Black1 · 8 July 2002 20:55

Hello –

If $/ (the argument to String#each) could be made to accept regex,
then the record separator could be String#each(//) if necessary,
allowing character-by-character parsing. This is not currently
possible.

I’ll put in (yet another plug for String#split:

str.split(//).each …

I think the idea of having String#each take a regex is, essentially, a
kind of compression of this – in other words, instead of #split’ing
on a regex and then iterating through the resulting array, moving the
regex intelligence into the argument to #each. I’m of two minds about
the merits of this, but in any case I don’t think there would be
anything possible in the language that isn’t possible now.

David

···

On Mon, 8 Jul 2002, Austin Ziegler wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Pit · 8 July 2002 21:49

Instead, parameterized mix-in (e.g. include Enumerable using each_byte
as each) is a interesting idea, but it should be pretty hard to
implement.

I also found it interesting. Here’s a quick hack that allows the
following:

s = “Hello\nMatz”
s.parameterized_extend Enumerable, { :each => :each_byte }

print ‘Bytes:’
s.map { | i | print " #{i}" }
puts

called from Enumerable, thus uses each_byte

=> Bytes: 72 101 108 108 111 10 77 97 116 122

print ‘Lines:’
s.each { | i | print " #{i.chomp}" }
puts

called directly, thus uses the original each

=> Lines: Hello Matz

Here’s the code (not much tested, works on my 1.6.6 Windows NT).

Regards,
Pit

class Module

def parameterized_include( mod, map )
include mod
_pi_rename_methods map
_pi_create_delegate mod, map
end

private

def _pi_rename_methods( map )
map.each do
> method, mapped |
alias_method _pi_original_method( method ), method
undef_method method
end
end

def _pi_create_delegate( mod, map )
module_eval <<-eval_end
def method_missing( method, *args, &block )
send( _pi_method_to_send( method, #{mod}, #{map.inspect}
), *args, &block )
end
eval_end
end

end

class Object

def parameterized_extend( mod, map )
instance_eval <<-eval_end
class << self
parameterized_include( #{mod}, #{map.inspect} )
end
eval_end
end

private

def _pi_method_to_send( method, mod, map )
if map.keys.include?( method )
if mod.method_defined?( _pi_calling_method )
map[ method ]
else
_pi_original_method( method )
end
else
method
end
end

def _pi_original_method( method )
“_pi#{method}”
end

def _pi_calling_method
if caller[ 2 ] =~ /.:in `(.?)'/
$1
else
‘’
end
end

end

···

On 8 Jul 2002, at 23:57, Yukihiro Matsumoto wrote:

SER1 · 9 July 2002 04:05

Yukihiro Matsumoto wrote:

Since there doesn’t seem to be a universally happy medium, perhaps the
problem lies in assuming there is one. I think the method “each” is the
problem: it’s ambiguous (“each what?”).

If the name “each” is the problem, it is the problem that costs just a
few bits in your brain, so it’s virtually nothing at all.

I’m sorry. I’ve forgotten who originally suggested this, but the best
solution I’ve seen so far is allowing $/ = ‘’, whereby String::each would
use the invisible space between characters as the boundry for itteration.
This would require no change in current behavior, and while new Ruby users
will continue to be bitten by the default String::each behavior, at least
there’ll be a sensible solution to the obviously significant need to
itterate over characters. I say “obviously” because there seem to be a
number of people in this discussion who want that functionality.

In the end, I don’t know if this would be faster than using
String::split(‘’), but it would probably be more intuitive.

— SER

Topic		Replies	Views
String#each ruby-talk	14	162	9 July 2002
Is there a better string.each? ruby-talk	89	277	9 July 2002
Working with strings ruby-talk	9	86	3 February 2005
Is there a better string.each? ruby-talk	0	121	8 July 2002
Ruby-dev summary 26385-26467 ruby-talk	1	111	18 July 2005

Is there a better string.each?

– – Jim Weirich jweirich@one.net http://w3.one.net/~jweirich

=> [document,[[section,[para,para]], [section[para,para,para]]]]

structured abstract model representation for the lib

utils: # # # # # #

this file was automatically generated; and changes can get overwritten’

print class Ellipse

prints out nicely

open(‘generated_lib.tst’,‘w’) do |file| file.write lib.format end

called from Enumerable, thus uses each_byte

=> Bytes: 72 101 108 108 111 10 77 97 116 122

called directly, thus uses the original each

=> Lines: Hello Matz

Related topics

–
– Jim Weirich jweirich@one.net http://w3.one.net/~jweirich

open(‘generated_lib.tst’,‘w’) do |file|
file.write lib.format
end