Wierd problem reading fields in a tab-delimitted row

Everyone–

A wierd problem occurs in Ruby 1.67, 1.68 and 1.8 (and probably other versions?): If I step through rows in a tab-delimited text file using code like this:

IO.foreach("wierd.txt") { |x|
	field = x.chop.split("\t")
	if field[4] != ""
		puts "some_string"
		puts field[4]
	else
		puts "no_string"
		puts field[4]
	end
	}

to read each line of a text file (wierd.txt) as in this example:

string1<tab>string2<tab>string3<tab><tab><tab><tab>string4<tab><tab>
string4<tab>string5<tab>string6<tab><tab><tab><tab>string8<tab>
string9<tab>string10<tab>string11<tab><tab><tab><tab>
string12<tab>string13<tab>string14<tab><tab><tab><tab>string15

I get this output:

···

===================
no_string

no_string

some_string
nil
no_string
===================

The question is, why does field[4] only get assigned nil in the 3rd line but some phantom non-“nil” value in the 1st, 2nd, and 4th lines? The general case I’ve found is that a nil value is read from a ‘column’ if there are no strings in any ‘columns’ to the right.

Thanks if you can help!

-Kurt

Hi –

Everyone–

A wierd problem occurs in Ruby 1.67, 1.68 and 1.8 (and probably
other versions?): If I step through rows in a tab-delimited text file
using code like this:

IO.foreach(“wierd.txt”) { |x|
field = x.chop.split(“\t”)
if field[4] != “”
puts “some_string”
puts field[4]
else
puts “no_string”
puts field[4]
end
}

to read each line of a text file (wierd.txt) as in this example:

string1string2string3string4
string4string5string6string8
string9string10string11
string12string13string14string15

I get this output:

===================
no_string

no_string

some_string
nil
no_string

The question is, why does field[4] only get assigned nil in the 3rd
line but some phantom non-“nil” value in the 1st, 2nd, and 4th
lines? The general case I’ve found is that a nil value is read from
a ‘column’ if there are no strings in any ‘columns’ to the right.

Technically, I think, it’s that split stops adding values to the
array when it stop encountering anything other than delimiters on
the right:

irb(main):002:0> “aaaba”.split(/a/)
=> [“”, “”, “”, “b”]

That last “a” in the string does not result in a final “” in the return
array. And if the string has no non-delimiters:

irb(main):003:0> “aaa”.split(/a/)
=>

Unless you give a negative second argument to split, which indicates
you want the empty strings returned:

irb(main):001:0> “aaaba”.split(/a/,-1)
=> [“”, “”, “”, “b”, “”]
irb(main):002:0> “aaa”.split(/a/,-1)
=> [“”, “”, “”, “”]

So, in your third line, which ends with a bunch of tabs, you don’t get
the empty strings in the return array.

(I think your test output might be clouding the results a little,
since “” gets called “no_string” while nil gets called “some_string”
:slight_smile: But anyway – it’s split’s handling of delimiters on the right of
the string that’s behind what’s happening.)

David

···

On Sun, 24 Aug 2003, Kurt Euler wrote:


David Alan Black
home: dblack@superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav