Inconsistency with IO.readlines

I've noticed a slight inconsistancy with IO.readlines depending on the
line ending. If the line ending is PC (\r\n) or Unix (\n) then it
works fine no matter the platform. But if the line ending is Mac
(\r), IO.readlines returns one line no matter how many lines it is
supposed to be.

Maybe this isn't such a big deal anymore since Mac OS X uses Unix line
endings, but believe it or not I've got files coming in from Mac OS 9
users.

Anyone else seen this? I'm using a build built from the last stable
snapshot under Cygwin.

···

--
Justin Rudd
http://seagecko.org/thoughts/

Hmm, don't know OS9 file format, but couldn't you set $/
(==$INPUT_RECORD_SEPARATOR) to last character in the file, something
like that?) Don't think there''s anything like Python universal
newlines (open file in mode "U").

Justin Rudd wrote:

I've noticed a slight inconsistancy with IO.readlines depending on

the

line ending. If the line ending is PC (\r\n) or Unix (\n) then it
works fine no matter the platform. But if the line ending is Mac
(\r), IO.readlines returns one line no matter how many lines it is
supposed to be.

Maybe this isn't such a big deal anymore since Mac OS X uses Unix

line

···

endings, but believe it or not I've got files coming in from Mac OS 9
users.

Anyone else seen this? I'm using a build built from the last stable
snapshot under Cygwin.

--
Justin Rudd
http://seagecko.org/thoughts/

# ========== ext/input_reader.rb ==========
# ext/input_reader.rb
# accept some name argument. if name is nil || '-',
# then use $stdin and send that to the block
# otherwise, use File.open(name, *restargs)

require 'delegate'

def input_reader(fname, *fargs)
  block = Proc.new {|source|
    getter = SimpleDelegator.new(source)
    yield getter
    getter.__setobj__(nil)
  }

  if fname.nil? || fname == '-' then
    block[$stdin]
  else
    File.open(fname, *fargs) {|f| block[f]}
  end
end
# ========== ext/input_reader.rb ==========

# ========== ext/text2lines.rb ==========
# ext/text2lines.rb
# Take character input from a source at a time, and
# if we've struct a sequence \n|\r gold, then replace
# print an argument-defined eoln marker, instead.

···

#
# endls is the array of eoln marker characters.
# valid integers (string or actual) are valid, along
# with the string `newline' - which reduces to the
# system newline $/
#
# source is anything that responds to getc()
#
# sink is anything that responds to print(string)
#

require 'ext/input_reader'

def text2lines(endls, source, sink=nil)
  separator, marker = *endls.inject([[], []]) {|(sep, mark), new|
    case new.downcase
      when /^(-)?newline$/ then ($1.nil? ? mark : sep) << $/
      when /^(-)?\d+$/ then ($1.nil? ? mark : sep) << new.to_i.abs.chr
    end
    
    [sep, mark]
  }
  marker = marker.empty? ? $/ : marker.join('')
  separator = [10.chr, 13.chr] if separator.empty?

  char, prev, lastp = nil, nil, true
  lline, splitter = nil, ''
  counts = Hash.new(0)

  pchar = Proc.new {
    if block_given? then
      lline ||= ''
      lline << char.chr
    else
      sink.print char.chr
    end
  }
  pmark = Proc.new {
    if block_given? then
      yield(lline || '', marker)
      lline = nil
    else
      sink.print marker
    end
  }
  domark = Proc.new {
    pmark[]
    counts = Hash.new(0)
  }
  pnull = Proc.new {}

  if separator.include?(char.chr) then
    domark[] if counts[char] > 0
    counts[char] += 1
  else
    domark[] if counts.values.include?(1)
    pchar[]
  end while (char = source.getc)

  domark[]
end

def read_text2lines(file, *args)
  lines, chomper = [], args.delete('-c') {false}
  input_reader(file) {|source|
    text2lines(args.map {|i| i.to_s}, source) {|l,t|
      lines << l
      lines[-1] << t unless chomper || t.nil?
    }
  }
  lines
end
# ========== ext/text2lines.rb ==========

# ========== ~/local/bin/text2lines ==========
#!/usr/bin/env ruby

require 'ext/text2lines'

def usage(out=$stdout)
out.puts <<-END_USAGE
Usage: #{$0} [newline | ascii-code]+
  Replaces all instances of \\r and \\n with new end of line
  markers. \\r\\n and \\n\\r are treated as one unit. \\r\\r and
  \\n\\n are treated as two.

  The new markers are formed from command-line arguments. If
  no arguments are given, then the system's end of line marker
  is used. Otherwise, the sequence of ascii-codes / newlines
  are used, with newline representing the system's end of line
  marker. Characters are read from stdin.

  EXAMPLES:
     #{$0} 13 10
  replaces all `standard' end of line markers with \\r\\n.
  
     #{$0} newline
  replaces all `standard' end of line markers with the system
  end of line marker.
END_USAGE
  exit(-1)
end

args = ARGV.map {|arg|
  case arg
    when /^--+h(e(l(p)?)?)?$/i then usage
    when /^newline$/ then arg
    when /^\d+$/ then arg
    else
      $stderr.puts "Error - bad argument #{arg}"
      usage[$stderr]
  end
}

text2lines(args, $stdin, $stdout)
# ========== ~/local/bin/text2lines ==========

[ummaycoc@localhost ummaycoc]$ echo 'hello
my
ruby
loving
friends' | text2lines 65
helloAmyArubyAlovingAfriendsA[ummaycoc@localhost ummaycoc]$

[ummaycoc@localhost ummaycoc]$ echo 'hello
my
ruby
loving
friends' | text2lines 13 > rubytmp

[ummaycoc@localhost ummaycoc]$ more rubytmp
friends
[ummaycoc@localhost ummaycoc]$

so, obviously, if this doesn't work for you - getc will :slight_smile:

--
There's no word in the English language for what you do to a dead
thing to make it stop chasing you.

Let me apologize in advance, because I've not bee following this thread. I have been working with Gavin Sinclair to document delegate.rb though, and together we've been hard pressed to find a single good use for SimpleDelegate...

···

On Dec 23, 2004, at 1:03 PM, Matt Maycock wrote:

# ========== ext/input_reader.rb ==========
# ext/input_reader.rb
# accept some name argument. if name is nil || '-',
# then use $stdin and send that to the block
# otherwise, use File.open(name, *restargs)

require 'delegate'

def input_reader(fname, *fargs)
  block = Proc.new {|source|
    getter = SimpleDelegator.new(source)
    yield getter
    getter.__setobj__(nil)
  }

  if fname.nil? || fname == '-' then
    block[$stdin]
  else
    File.open(fname, *fargs) {|f| block[f]}
  end
end

Would you mind explaining to me why you use SimpleDelegate above? I would really appreciate it.

James Edward Gray II

So I glanced at delegate in the past but never really used it. Today,
I saw this, and rewrote some old code thinking that using that would
be easier (this was after I decided to post but before clicking submit
[obviously :-)] - so you guys got my cutting edge changes! really
just cleaned out 6 or so lines...)

My use of SimpleDelegator is that I don't want a case like this:
handle = nil
input_reader("myfile") {|handle| ...}
some_func(handle)

granted, the file ensures that things are closed off so I really don't
have to, but to go along with the idea of how things `should be' - I
used simple delegator for __set_obj__. Just design philosophy.

The only case I can think of for this mattering in the file I gave is this:

handle = nil
input_reader(some_arg) {|handle| ...}
handle.puts "Meow Mix Please Deliver"

depending on the value of some_arg (nil or '-' vs otherwise) - the
above code works as expected. This is especially important if you
factored your code such that inside {|handle| ...} all you did was
invoke a method and pass handle to it. Now, you have a `bug' and it
doesn't event really look like anything remotely inputy-outputy except
for my function name. So the guarantee of failure under any arguments
(ie purity in a sort of functional sense) is the benefit wrt the
handle.puts line, above.

I may have babbled there a bit - it's late.

Matthew Maycock

···

--
There's no word in the English language for what you do to the thing a
dead thing delegated to chase after you to make it stop chasing you.

My use of SimpleDelegator is that I don't want a case like this:
handle = nil
input_reader("myfile") {|handle| ...}
some_func(handle)

Thank you for walking through this with me.

granted, the file ensures that things are closed off so I really don't
have to, but to go along with the idea of how things `should be' - I
used simple delegator for __set_obj__. Just design philosophy.

The only case I can think of for this mattering in the file I gave is this:

handle = nil
input_reader(some_arg) {|handle| ...}
handle.puts "Meow Mix Please Deliver"

Hmm, but the code I saw was:

  block = Proc.new {|source|
    getter = SimpleDelegator.new(source)
    yield getter
    getter.__setobj__(nil)
  }

You're worried that "getter" may have existed outside input_reader(), in the calling code, and you want it to be immediately obvious if you trample that value?

James Edward Gray II

···

On Dec 23, 2004, at 11:57 PM, Matt Maycock wrote:

Thank you for walking through this with me.

No prob :slight_smile:

> handle = nil
> input_reader(some_arg) {|handle| ...}
> handle.puts "Meow Mix Please Deliver"

Hmm, but the code I saw was:

> block = Proc.new {|source|
> getter = SimpleDelegator.new(source)
> yield getter
> getter.__setobj__(nil)
> }

You're worried that "getter" may have existed outside input_reader(),
in the calling code, and you want it to be immediately obvious if you
trample that value?

So I think there's a chance what you're saying is what I meant - but
maybe not (due to natural language ambiguity -- for me arising from
the `but' in your `Hmm, but the code I saw was:')

So the variable block is there make the delegator `getter' - send it
to the block that was invoked with input_reader, and then have
`getter' delegate to nil. The reason for this is that if you passed
nil or '-' to input_reader, then (had the __set_obj__ not been invoked
after yield), then the `handle' variable above would have been valid
to use as an IO object. However, this would not be so if a filename
was given (as that IO object would have been closed by the File#open
method after the execution of block#). This way, both forms of
invocation ($stdin and fileIO) behave the same way wrt the block
variable getter that is passed to yield, before and after return from
#input_reader.

So I think that's exactly what you mean - but I just
reworded/reiterated it to make sure.

···

--
There's no word in the English language for what you do to a dead
thing to make it stop chasing you.

Got it. Thanks again.

James Edward Gray II

···

On Dec 24, 2004, at 1:28 PM, Matt Maycock wrote:

So I think that's exactly what you mean - but I just
reworded/reiterated it to make sure.