(0..3) == (0...4) returning false?

Yukihiro_Matsumoto2 · 3 July 2002 08:40

Hi,

···

In message “Re: (0…3) == (0…4) returning false?” on 02/07/03, “Juergen Katins” juergen.katins@web.de writes:

Although (0…4) === 3.2 produces true (which means 3.2 is
included in (0…4) ) 3.2 is no element of (0…4) in terms of
(0…4).each (which means 3.2 is NOT included in (0…4) ).
It is a little bit illogical. Look out!

I’m working on it in 1.7, but still no firm decision.
If you have any comment, proposal, and idea, tell me freely.

						matz.

Tom_Sawyer · 4 July 2002 00:48

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

and i want to parse this into an array like so:

[‘a’,’’,’[c]’,’{d}’,’“e”’,‘f’,’{{g}}’,’[[h]]’,‘i’,‘j’,’“k”’]

for clearity here’s another example:

“a”{{b}}c{d} -> [’“a”’,’{{b}}’,‘c’,’{d}’,’’]

i have found such strings difficult to parse for a number of reasons.
first, the text in double-quotations could contain bracket symbols but
must be treated as plain text. second, [ and { brackets can be nested as
in g and h of the first example. and finally, because whitespace can
optionally occur between the parts.

i wish ruby had some sort of parsing tool to make this childs play.

can any of you geniuses out there offer an elegent solution? my current
code dosen’t work 100% and is like 90+ lines long. ugghh!

~transami

Clifford_Heath1 · 8 July 2002 01:01

Tobias Reif wrote:

… or simply Array.new (4…15)

Perhaps, but Array.new(0…2**100) could be a bad idea.
A polymorphic implementation of Array designed to contain
continuous ranges could handle this though. Or should Range
support the array operators?

···

–
Clifford Heath

David_Alan_Black1 · 3 July 2002 09:13

Hi –

···

On Wed, 3 Jul 2002, Yukihiro Matsumoto wrote:

Hi,

In message “Re: (0…3) == (0…4) returning false?” > on 02/07/03, “Juergen Katins” juergen.katins@web.de writes:

Although (0…4) === 3.2 produces true (which means 3.2 is
included in (0…4) ) 3.2 is no element of (0…4) in terms of
(0…4).each (which means 3.2 is NOT included in (0…4) ).
It is a little bit illogical. Look out!

I’m working on it in 1.7, but still no firm decision.
If you have any comment, proposal, and idea, tell me freely.

My instinct would be to expect:

aRange === n iff aRange.to_a.include?(n)

(which probably ends up being the same as Juergen’s #each test)

David

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Nikodemus_Siivola · 3 July 2002 11:20

You want wishlists? I have wishlists

Ranges as sets of numbers:

Set.new(0…4) === 3.2 #-> false
Set.new(0…4, 0.1) === 3.2 #-> true

Ranges as ranges of floats and integers:

Range.new(0…4) === 3.2 #-> true

So, the current behaviour is correct, since iteration over a
floating range should default to a step of one, but I should definable
at creation:

Range.new(0…2, 0.5) === 1.3 #-> true
Range.new(0…2, 0.5).to_ary == [0.0, 0.5, 1.0, 1.5, 2.0]

So a range object would case-match against any number between the two
extremes, but would be iterated over at a definable step and use the same
step in array / set conversion.

A Set-class in the core.

– Nikodemus

···

On Wed, 3 Jul 2002, Yukihiro Matsumoto wrote:

Although (0…4) === 3.2 produces true (which means 3.2 is
included in (0…4) ) 3.2 is no element of (0…4) in terms of

I’m working on it in 1.7, but still no firm decision.
If you have any comment, proposal, and idea, tell me freely.

Guillermo_Fernandez · 4 July 2002 01:01

Tom Sawyer wrote:

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

and i want to parse this into an array like so:

[‘a’,‘’,‘[c]’,‘{d}’,‘“e”’,‘f’,‘{{g}}’,‘[[h]]’,‘i’,‘j’,‘“k”’]

for clearity here’s another example:

“a”{{b}}c{d} → [‘“a”’,‘{{b}}’,‘c’,‘{d}’,‘’]

i have found such strings difficult to parse for a number of reasons.
first, the text in double-quotations could contain bracket symbols but
must be treated as plain text. second, [ and { brackets can be nested as
in g and h of the first example. and finally, because whitespace can
optionally occur between the parts.

I’ve seen that you have <, [, {, " and * Are those the only simbols?
Because in that case, I would use a regexp of the style:

(([<[{“][a-z][>]}”])\s*)+

Then you could see what groups are only made of empty spaces (the ones
you want to ignore) and wich are made of the elements that you want to
insert to the list with something like
if !re.match(“\s”, element_to_test):
Add_To_List()

As you know how many groups there are after parsing (parser.start() and
parser.end() ) you can loop over all the finde patterns.

That’s how I would do it, but I’m only a newbie…

Regards,

Guille

Michael_Campbell1 · 4 July 2002 01:16

In a general sense, you might run into serious problems with this,
since what you are describing isn’t “regular” (in the mathematical
sense) and thus are going to be hard to use a regex to correctly
capture. (Note here that ruby’s regex patterns capture SLIGHTLY more
than what is ‘regular’ too, so that helps some.)

···

— Tom Sawyer transami@transami.net wrote:

i find myself spending alot of time writing routines to parse
complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

and i want to parse this into an array like so:

[‘a’,‘’,‘[c]’,‘{d}’,‘“e”’,‘f’,‘{{g}}’,‘[[h]]’,‘i’,‘j’,‘“k”’]

for clearity here’s another example:

“a”{{b}}c{d} → [‘“a”’,‘{{b}}’,‘c’,‘{d}’,‘’]

i have found such strings difficult to parse for a number of
reasons.
first, the text in double-quotations could contain bracket symbols
but
must be treated as plain text. second, [ and { brackets can be
nested as
in g and h of the first example. and finally, because whitespace
can
optionally occur between the parts.

i wish ruby had some sort of parsing tool to make this childs play.

can any of you geniuses out there offer an elegent solution? my
current
code dosen’t work 100% and is like 90+ lines long. ugghh!

~transami

=====

Use your computer to help find a cure for cancer: http://members.ud.com/projects/cancer/

Yahoo IM: michael_s_campbell

Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free

currently.att.yahoo.com

Currently.com - AT&T Yahoo Email, News, Sports & More

Get the latest in news, entertainment, sports, weather and more on Currently.com. Sign up for free email service with AT&T Yahoo Mail.

Park_Heesob5 · 4 July 2002 03:54

“Tom Sawyer” transami@transami.net wrote in message
news:1025743968.2467.123.camel@silver…

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

and i want to parse this into an array like so:

[‘a’,‘’,‘[c]’,‘{d}’,‘“e”’,‘f’,‘{{g}}’,‘[[h]]’,‘i’,‘j’,‘“k”’]

for clearity here’s another example:

“a”{{b}}c{d} → [‘“a”’,‘{{b}}’,‘c’,‘{d}’,‘’]

i have found such strings difficult to parse for a number of reasons.
first, the text in double-quotations could contain bracket symbols but
must be treated as plain text. second, [ and { brackets can be nested as
in g and h of the first example. and finally, because whitespace can
optionally occur between the parts.

i wish ruby had some sort of parsing tool to make this childs play.

can any of you geniuses out there offer an elegent solution? my current
code dosen’t work 100% and is like 90+ lines long. ugghh!

~transami

Hi?

How about this?

···

str = ‘a[c]{d}“e"f {{g}} [[h]]i**j"k”’

s = [‘<’,‘[’,‘{’,‘"’,‘‘]
e = [’>‘,’]‘,’}‘,’"‘,’’]
arr =

str.gsub!(’ ‘,’')
i = 0
while i< str.length

if not s.include?(str[i,1])
for j in i … str.length
if s.include?(str[j,1])
arr << str[i…j-1]
i = j
break
end
end
end

for j in 0 … s.length
if str[i,1] == s[j]
if s[j]==‘"’ || s[j]==‘*’
k = str.index(e[j],i+1)
else
k = i
k += 1 while str[k,1]==s[j]

for k in str.index(e[j],k) ... str.length if str[k,1]!=e[j] k -= 1 break end end end arr << str[i..k] i = k+1 break end

end

end

p arr

Park Heesob.

SER1 · 4 July 2002 03:54

Tom Sawyer wrote:

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

Hi Tom,

This isn’t the most beautiful or efficient code I’ve ever written, but the
method is only 22 lines long. You could probably get it down to 18 lines
if you obfuscated it a little more. Sorry for the perlisms. Once you
start using them, they’re a hard habit to kick.

You can’t solve your problem with regexps alone. Oh, BTW, I wasn’t sure if
you really meant you wanted whitespace to be stripped; if so, change the
line:

string = $’

to read

string = $'.strip
def tokenizer string tokens = { ?< => '>', ?[=>']', ?"=>'"', ?{=>'}', ?*=>'*', ?'=>"'"} items = [] while string.size > 0 if tokens.keys.include? string[0] end_index = string.index( tokens[string[0]], 1 ) item = string[0..end_index] items < item.count( tokens[item[0]] ) end_index = string.index( tokens[item[0]] ) item << string[0..end_index] string = string[end_index+1..-1] end else string =~ /(.*?)(?=[<[{"*'])/ items << $1 string = $' end end items end
puts tokenizer( %Q{a[c]{d}“e"f {{g}} [[h]]i**j"k”} ).inspect
ser@ender ~% ruby tokenizer.rb ["a", "", "[c]", "{d}", "\"e\"", "f ", "{{g}}", " ", "[[h]]", "*i*", "*j*", "\"k\""]

···

–

… “A scientist is one who finds interest in the kinetic energy of
<|> Jell-O moving at ridiculous velocities…an engineer is one who can
/|\ find a real-life application for such silliness.”
/| – anon

David_Alan_Black1 · 3 July 2002 09:53

Hmmm… maybe not. I forgot about Range#include?, which already
takes care of this, so to speak.

Let me put it another way, then:

I think there should be a method that does what Range#=== currently
does, though (for whatever reason) my expectation when I actually see
Range#=== is that it will behave like Range#include? (whereas in fact
it behaves like the non-existent method Range#encompasses?).

David

···

On Wed, 3 Jul 2002, David Alan Black wrote:

My instinct would be to expect:

aRange === n iff aRange.to_a.include?(n)

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Austin_Ziegler2 · 3 July 2002 13:01

I think that your test would result, however, in (0…3) == (0…4)
and I’m not sure that’s good. Although I haven’t yet had a need to
use ranges in this way, I actually think that the current behaviour
is proper such that each covers the whole numbers.

I might consider a variation on each for ranges to be nice:

(0…4).each(0.5) { |x| print "#{x}, " } #
=> 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5,

It’d be nicer if you could specify the step as part of the creation
of the range, but I’m really not sure how you’d do that.

-austin
– Austin Ziegler, austin@halostatue.ca on 2002.07.03 at 08.56.13

···

On Wed, 3 Jul 2002 18:13:52 +0900, David Alan Black wrote:

On Wed, 3 Jul 2002, Yukihiro Matsumoto wrote:

on 02/07/03, “Juergen Katins” juergen.katins@web.de writes:

Although (0…4) === 3.2 produces true (which means 3.2 is
included in (0…4) ) 3.2 is no element of (0…4) in terms of
(0…4).each (which means 3.2 is NOT included in (0…4) ). It
is a little bit illogical. Look out!
I’m working on it in 1.7, but still no firm decision. If you have
any comment, proposal, and idea, tell me freely.
My instinct would be to expect:

aRange === n iff aRange.to_a.include?(n)

(which probably ends up being the same as Juergen’s #each test)

David_Alan_Black1 · 3 July 2002 13:40

Hello –

Ranges as ranges of floats and integers:

Range.new(0…4) === 3.2 #-> true

So, the current behaviour is correct, since iteration over a
floating range should default to a step of one, but I should definable
at creation:

I believe the step is defined more generally as anObject.succ, which
of course is +1 for integers but which can have very different
(non-numerical) meaning for other types.

Range.new(0…2, 0.5) === 1.3 #-> true
Range.new(0…2, 0.5).to_ary == [0.0, 0.5, 1.0, 1.5, 2.0]

So a range object would case-match against any number between the two
extremes, but would be iterated over at a definable step and use the same
step in array / set conversion.

You’re using a constructor as an argument to a constructor here…
Are you advocating a fourth argument to Range.new? Although, again,
this seems to be specifically a way to construct numerical ranges,
without (to me) clear applicability to other ranges.

David

···

On Wed, 3 Jul 2002, Nikodemus Siivola wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Tom_Sawyer · 4 July 2002 05:29

per your code:

str = ‘a[c]{d}“e"f {{g}} [[h]]i**j"k”’

s = [‘<’,‘[’,‘{’,‘"’,‘‘]
e = [’>‘,’]‘,’}‘,’"‘,’’]
arr =

str.gsub!(’ ‘,’')
i = 0
while i< str.length

if not s.include?(str[i,1])
for j in i … str.length
if s.include?(str[j,1])
arr << str[i…j-1]
i = j
break
end
end
end

for j in 0 … s.length
if str[i,1] == s[j]
if s[j]==‘"’ || s[j]==‘*’
k = str.index(e[j],i+1)
else
k = i
k += 1 while str[k,1]==s[j]

for k in str.index(e[j],k) ... str.length if str[k,1]!=e[j] k -= 1 break end end end arr << str[i..k] i = k+1 break end

end

end

very impressive. i especially like the lack of regexp.
i only had to make two small changes to get it to work just right:
removed str.gsub!(’ ‘,’')
and changed arr << str[i…j-1] to arr << str[i…j-1].strip

it still has one problem though. it falls into an infinite loop
if the last part of the string isn’t contained by any tokens.
for example if an l is added:
str = ‘a[c]{d}"e"f {{g}} [[h]]i**j"k"l’
i’m sure this could be easily fixed.

did you take a look at sean’s version, by the way?
a tad more elegent although he does use regexps.

i managed to get my own code to work and down to about 60 lines,
but it’s still little league compared to yours and sean’s examples.
mine was sort of a cross between yours and seans actually, but using big old
case statement.

i decided to use the faster of the three. i thought it would be yours
for the lack of the regexp, but suprisingly sean’s edges out. (mine was last)
ruby’s regexp implementation must be quite “crisp”.

anyway, thanks Park. your code was quite helpful and interesting to study.

~transami

Tom_Sawyer · 4 July 2002 06:08

thanks sean!

your code turned out to be right on the money. only a couple of ever so
minor adjustments and it worked like a charm. very elegent solution. i
can’t tell you how impressed i am.

by the way, i am really happy to hear about the attributes in rexml! it
means my xml:proof schema api is functional as is. i had been holding
out so that’s about ready to go. i’m not able to work on it
presently, i’m busy with some other stuff (which you just helped me
with!) but i’ll get back to it soon and let you know if i run into
anything.

later and thanks again,
tom

···

On Wed, 2002-07-03 at 21:54, Sean Russell wrote:

Tom Sawyer wrote:

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

Hi Tom,

This isn’t the most beautiful or efficient code I’ve ever written, but the
method is only 22 lines long. You could probably get it down to 18 lines
if you obfuscated it a little more. Sorry for the perlisms. Once you
start using them, they’re a hard habit to kick.

You can’t solve your problem with regexps alone. Oh, BTW, I wasn’t sure if
you really meant you wanted whitespace to be stripped; if so, change the
line:

string = $’

to read

string = $'.strip
def tokenizer string tokens = { ?< => '>', ?[=>']', ?"=>'"', ?{=>'}', ?*=>'*', ?'=>"'"} items = [] while string.size > 0 if tokens.keys.include? string[0] end_index = string.index( tokens[string[0]], 1 ) item = string[0..end_index] items < item.count( tokens[item[0]] ) end_index = string.index( tokens[item[0]] ) item << string[0..end_index] string = string[end_index+1..-1] end else string =~ /(.*?)(?=[<[{"*'])/ items << $1 string = $' end end items end
puts tokenizer( %Q{a[c]{d}“e"f {{g}} [[h]]i**j"k”} ).inspect
ser@ender ~% ruby tokenizer.rb ["a", "", "[c]", "{d}", "\"e\"", "f ", "{{g}}", " ", "[[h]]", "*i*", "*j*", "\"k\""]
–

… “A scientist is one who finds interest in the kinetic energy of
<|> Jell-O moving at ridiculous velocities…an engineer is one who can
/|\ find a real-life application for such silliness.”
/| – anon

Tom_Sawyer · 4 July 2002 06:46

by the way, here’s my result:

def tokenizer3 string
tokens = {‘<’=>‘>’, ‘[’=>‘]’, ‘"’=>‘"’, ‘{’=>‘}’, ‘‘=>’’, “'”=>“'”}
items =
while string.size > 0
if tokens.keys.include?(string[0,1])
end_index = string.index(tokens[string[0,1]], 1)
item = string[0…end_index]
items << item
string = string[end_index+1…-1]
while item.count(item[0,1]) > item.count(tokens[item[0,1]])
end_index = string.index(tokens[item[0,1]])
item << string[0…end_index]
string = string[end_index+1…-1]
end
else
end_index = string.index(/([[{<*"'\s]|\z)/, 1)
item = string[0…end_index-1].strip
items << item if not item.empty?
string = string[end_index…-1]
end
end
items
end

~transami (tom)

···

On Wed, 2002-07-03 at 21:54, Sean Russell wrote:

Tom Sawyer wrote:

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

Hi Tom,

This isn’t the most beautiful or efficient code I’ve ever written, but the
method is only 22 lines long. You could probably get it down to 18 lines
if you obfuscated it a little more. Sorry for the perlisms. Once you
start using them, they’re a hard habit to kick.

You can’t solve your problem with regexps alone. Oh, BTW, I wasn’t sure if
you really meant you wanted whitespace to be stripped; if so, change the
line:

string = $’

to read

string = $'.strip
def tokenizer string tokens = { ?< => '>', ?[=>']', ?"=>'"', ?{=>'}', ?*=>'*', ?'=>"'"} items = [] while string.size > 0 if tokens.keys.include? string[0] end_index = string.index( tokens[string[0]], 1 ) item = string[0..end_index] items < item.count( tokens[item[0]] ) end_index = string.index( tokens[item[0]] ) item << string[0..end_index] string = string[end_index+1..-1] end else string =~ /(.*?)(?=[<[{"*'])/ items << $1 string = $' end end items end
puts tokenizer( %Q{a[c]{d}“e"f {{g}} [[h]]i**j"k”} ).inspect
ser@ender ~% ruby tokenizer.rb ["a", "", "[c]", "{d}", "\"e\"", "f ", "{{g}}", " ", "[[h]]", "*i*", "*j*", "\"k\""]
–

… “A scientist is one who finds interest in the kinetic energy of
<|> Jell-O moving at ridiculous velocities…an engineer is one who can
/|\ find a real-life application for such silliness.”
/| – anon

Peter_Hickman · 4 July 2002 13:38

“Tom Sawyer” transami@transami.net wrote in message
news:1025743968.2467.123.camel@silver…

i find myself spending alot of time writing routines to parse complex
strings. here’s a mock example string of my current problem:

a[c]{d}“e"f {{g}} [[h]]i**j"k”

and i want to parse this into an array like so:

[‘a’,‘’,‘[c]’,‘{d}’,‘“e”’,‘f’,‘{{g}}’,‘[[h]]’,‘i’,‘j’,‘“k”’]

If my very rusty CS is correct this is one of the hardest types of
matching that a Finite State Machine can do. If I am correct then the
whole (‘a’ * n) + ‘b’ + (‘a’ * m) where n == m is virually imposible for
most classes of FSM. In effect you need to match, for example, x '<'s on
the left hand side and then match x '>'s on the right hand side, by the
time the FSM has started matching the right hand side then it has
forgotten how many there were on the left.

So a regex will not do it (as it is a FSM of the wrong type).

A stack based FSM machine that pushed each <, [, {, (, " or * that it
encountered onto a stack and then when it encountered a ), }, etc made
sure that it’s oposite was on the top of the stack. However there is a
problem with the requirement to parse i**j, how do we know that it is
not the start of something like ij, a structure like [i, [j]]? If
these nested structures are not possible the count the number of ~~s pop the term.~~

What may trip up your code is working out when you encounter a " or * if
it is an open or close. It would be trivial otherwise. If you could
ditch these symetrical symbols for something more pair like, --x++ or
//x** etc then life would be much easier.

Yukihiro_Matsumoto2 · 3 July 2002 16:38

Hi,

···

In message “Re: (0…3) == (0…4) returning false?” on 02/07/03, Austin Ziegler austin@halostatue.ca writes:

I might consider a variation on each for ranges to be nice:

(0…4).each(0.5) { |x| print "#{x}, " } #
=> 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5,

I defined Range#step in 1.7.2, but it’s not working right yet.

						matz.

Park_Heesob5 · 4 July 2002 06:54

Hi,

“Tom Sawyer” transami@transami.net wrote in message
news:1025760825.2466.193.camel@silver…

per your code:

…

it still has one problem though. it falls into an infinite loop
if the last part of the string isn’t contained by any tokens.
for example if an l is added:
str = ‘a[c]{d}"e"f {{g}} [[h]]i**j"k"l’
i’m sure this could be easily fixed.

Here is my second version:

···

str = ‘ad3[c]{d}"e"f {{g}} [[h]]i**j"k"ll"sdfsd"z’

s = [‘<’,‘[’,‘{’,‘"’,‘‘]
e = [’>‘,’]‘,’}‘,’"‘,’’]
arr =

i = 0
while i< str.length

if not s.include?(str[i,1])
j = i+1
j += 1 while j<str.length && !s.include?(str[j,1])
arr << str[i…j-1].strip
i = j
end

for j in 0 … s.length
if str[i,1] == s[j]
if s[j]==‘"’ || s[j]==‘*’
k = str.index(e[j],i+1)
else
k = i
k += 1 while str[k,1]==s[j]
k = str.index(e[j],k)
k += 1 while k+1<str.length && str[k+1,1]==e[j]
end
arr << str[i…k]
i = k+1
break
end
end
end

p arr

did you take a look at sean’s version, by the way?
a tad more elegent although he does use regexps.

Sean’s version fails at
str = ‘a[c]{d}"e"f {{g}} [[h]]i**j"k"l’

~transami

Park Heesob.

SER1 · 4 July 2002 15:15

Park Heesob wrote:

did you take a look at sean’s version, by the way?
a tad more elegent although he does use regexps.

Sean’s version fails at
str = ‘a[c]{d}"e"f {{g}} [[h]]i**j"k"l’

Adding two characters to the regexp fixes that. The regexp should be

string =~ /(.?)(?=[<[{"']|$)/

···

–

… “They that can give up essential liberty to obtain a little
<|> temporary safety deserve neither liberty nor safety.”
/|\ – Benjamin Franklin
/|

Tom_Sawyer · 4 July 2002 16:34

thanks Sean and Park. its actually been interesting comparing these two
very different pieces of code that do the same thing. here are my
results thus far with my modification to both:

park, note removal of the for j loop.

really helped the speed!

def parks_tokenizer(string)
s = [‘<’,‘[’,‘{’,‘"’,‘‘]
e = [’>‘,’]‘,’}‘,’"‘,’’]
items =
i = 0
while i < string.length
if not s.include?(string[i,1])
j = i+1
j += 1 while j < string.length && !s.include?(string[j,1])
items.concat string[i…j-1].strip.split(’ ')
i = j
else
j = s.index(string[i,1])
if s[j] == ‘"’ || s[j] == ‘*’
k = string.index(e[j],i+1)
else
k = i
k += 1 while string[k,1] == s[j]
k = string.index(e[j],k)
k += 1 while k+1 < string.length && string[k+1,1] == e[j]
end
items << string[i…k]
i = k+1
end
end
return items
end

sean, i got rid of the perlish notation

and made the second part more like the first

def seans_tokenizer(string)
tokens = {‘[’=>‘]’, ‘<’=>‘>’, ‘"’=>‘"’, ‘{’=>‘}’, ‘‘=>’’, “'”=>“'”}
items =
while string.size > 0
if tokens.keys.include?(string[0,1])
end_index = string.index(tokens[string[0,1]], 1)
item = string[0…end_index]
items << item
string = string[end_index+1…-1]
while item.count(item[0,1]) > item.count(tokens[item[0,1]])
end_index = string.index(tokens[item[0,1]])
item << string[0…end_index]
string = string[end_index+1…-1]
end
else
end_index = string.index(/[[{<*"'\s]|\z/, 1)
item = string[0…end_index-1].strip
items << item if not item.empty?
string = string[end_index…-1]
end
end
items
end

looping over 100 itereations of each results in park’s version taking
~2.8 seconds and sean’s ~2.3, but i think park’s might have a little
more room for improvement. oddly the more i work with them, the more i
am beginning to see that they are, in effect, the same. let you know how
that progresses.

by the way, one of the reasons i brought this up (and thank god i did as
these pieces of code are so much better then mine) was to perhaps talk
about Regular Expressions and string parsing in general. Seems to me
that parsing text is like THE fundemental programming task. why hasn’t
any really awsome technologies come about to deal with this. in my
personal opinion Regexps are powerful but limited, as indicated by my
parsing problem. i remember hearing that a language called Snobol had
great string processing capabilities. does anyone know about that?
finally, a Steven J. Hunter sent me this Icon version:

l_ans :=
str_in ? until pos(0) do # Written by Steven J. Hunter
if close_delim_cs := \open2close_t[open_delim := move(1)]
then put(l_ans, open_delim||tab(1+bal(close_delim_cs, ‘<[{’,‘}]>’)))
else tab(many(’ ')) | put(l_ans, tab(upto(start_of_nxt_token_cs)|0))

a real mouthful, but quite compact. i haven’t fully digested this yet.

thanks for participation! this has turned out to be much more
interesting and fruitful then i expected.

~transami (tom)

···

On Thu, 2002-07-04 at 09:15, Sean Russell wrote:

Park Heesob wrote:

did you take a look at sean’s version, by the way?
a tad more elegent although he does use regexps.

Sean’s version fails at
str = ‘a[c]{d}"e"f {{g}} [[h]]i**j"k"l’

Adding two characters to the regexp fixes that. The regexp should be

string =~ /(.?)(?=[<[{"']|$)/

–

… “They that can give up essential liberty to obtain a little
<|> temporary safety deserve neither liberty nor safety.”
/|\ – Benjamin Franklin
/|

Topic		Replies	Views
[RCR] New [] Semantics ruby-talk	96	173	10 October 2004
Range#member? Oddity ruby-talk	42	140	16 January 2006
Range syntax theory ruby-talk	22	110	8 October 2004
Too Many Ways? ruby-talk	98	147	5 October 2004
Bizarre Range behavior ruby-talk	41	192	20 August 2009

(0..3) == (0...4) returning false?

=====

p arr

p arr

park, note removal of the for j loop.

really helped the speed!

sean, i got rid of the perlish notation

and made the second part more like the first

Related topics