Idiomatic conversion of yielding block to array

So I have a function that generates like 300 lines of text and I want to unit test it. If it fails, it barfs a dozen pages of text then says there's a difference in there somewhere and we wish you the best of luck trying to find it.

So, yeah.

I'd like to break this out into a line-by-line test. The following code doesn't work because each_line is a generator function, not an array:

  def test.to_html
    # test the Calendar.to_html method
    cal = Calendar.new(@date_test)
    line_no = 0
    @html_reference.each_line.zip(cal.to_html.each_line).each do |test_line|
      line_no += 1
      assert_equal( test_line[0], cal_line[1], "Calendar#to_html output incorrect at line #{line_no}"
      # Test::Unit will then emit the reference line and the failed test line.
    end

Okay, so I want these each_lines done up as arrays.

      ref_lines = []; @html_reference.each_line {|line| ref_lines << line }
      cal_lines = []; cal.to_html.each_line {|line| cal_lines << line }
      line_no = 0
      ref_lines.zip(cal_lines).each do |test_line|
        line_no += 1
        assert_equal( test_line[0], test_line[1], "Calendar.to_html incorrect on line #{line_no}")
      end

This smells very kludgy and procedural. I'm using an index/counter variable, explicit Array declarations, and zipping two arrays together only to split them apart again. Ugh!

Can someone help me make this more elegant? C++ has pooped Ruby's pants.

-dB

···

--
David Brady
ruby_talk@shinybit.com
C++ Guru. Ruby nuby. Apply salt as needed.

require 'enumerator'
  cal_lines = cal.to_html.to_enum(:each_line).to_a

-Levin

···

David Brady <ruby_talk@shinybit.com> wrote:

      cal_lines = ; cal.to_html.each_line {|line| cal_lines << line }

Use unit_diff in ZenTest.

http://rubyforge.org/projects/zentest

···

On 02 Sep 2005, at 13:15, David Brady wrote:

So I have a function that generates like 300 lines of text and I want to unit test it. If it fails, it barfs a dozen pages of text then says there's a difference in there somewhere and we wish you the best of luck trying to find it.

--
Eric Hodel - drbrain@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Levin Alexander wrote:

···

David Brady <ruby_talk@shinybit.com> wrote:

     cal_lines = ; cal.to_html.each_line {|line| cal_lines << line }

  require 'enumerator'
  cal_lines = cal.to_html.to_enum(:each_line).to_a

-Levin

You don't need the to_a but still zip will create 3 arrays.

interesting...

Simon

Levin Alexander wrote:

require 'enumerator'
cal_lines = cal.to_html.to_enum(:each_line).to_a

Innnteresting. This condenses the loop initialization into a single line... but it's pretty hairy:

      @html_reference.to_enum(:each_line).zip(cal.to_html.to_enum(:each_line)).each_with_index do |test_line, line_no|
        assert_equal( test_line[0], test_line[1], "Calendar#to_html incorrect on line #{line_no+1}")
      end

I don't think we've really cleaned much up, though. We've moved from C++ pooping Ruby's pants to Ruby pooping its own pants. It's idiomatic, but we haven't reached elegance yet.

-dB

···

--
David Brady
ruby_talk@shinybit.com
C++ Guru. Ruby nuby. Apply salt as needed.

Simon Kröger wrote:

You don't need the to_a but still zip will create 3 arrays.

this does not, but its still not very ruby like:

···

-------------------------------------------------------------
s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

q1, q2 = SizedQueue.new(1), SizedQueue.new(1)
Thread.new {s1.each_line {|l| q1 << l}; q1 << nil}
Thread.new {s2.each_line {|l| q2 << l}; q2 << nil}

line_no = 0
while true
   line_no += 1
   t1, t2 = q1.pop, q2.pop
   break if !t1 || !t2
   puts "difference on line #{line_no}" if t1 != t2
end
-------------------------------------------------------------

perhaps better, but still kind of hardcore:

-------------------------------------------------------------
require 'enumerator'
require 'thread'

module Enumerable
   def each_zip(other)
     q = SizedQueue.new(1)
     Thread.new {other.each{|l| q << l}}
     self.each{|a| yield a, q.pop}
   end
end

s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

s1.to_enum(:each_line).enum_for(:each_zip,
   s2.to_enum(:each_line)).each_with_index do |t, l|
   puts "difference on line: #{l+1}" if t[0] != t[1]
end
-------------------------------------------------------------

food for thoughts...

Simon

David Brady wrote:

Levin Alexander wrote:

> require 'enumerator'
> cal_lines = cal.to_html.to_enum(:each_line).to_a
>
>
Innnteresting. This condenses the loop initialization into a single
line... but it's pretty hairy:

@html_reference.to_enum(:each_line).zip(cal.to_html.to_enum(:each_line)).each_with_index
do |test_line, line_no|
        assert_equal( test_line[0], test_line[1], "Calendar#to_html
incorrect on line #{line_no+1}")
      end

I don't think we've really cleaned much up, though. We've moved from
C++ pooping Ruby's pants to Ruby pooping its own pants. It's idiomatic,
but we haven't reached elegance yet.

class Array
  # The differences between 2 arrays of equal size.
  def dif( other )
    ( (0...self.size).to_a.zip( self ) -
      (0...self.size).to_a.zip( other ) ).map{|x| x<<other[x[0]] }
  end
end

@html_reference.to_enum(:each_line).dif(
  cal.to_html.to_enum(:each_line) ).each{ |x|
    # Line numbering starts with 1.
    x[0] += 1
    print "Incorrect at line "; puts x; puts
}

require 'generator'

s1 = %w{this is the first line}.join("\n")
s2 = %w{this is the second line}.join("\n")

compare = SyncEnumerator.new(s1.to_enum(:each_line), s2.to_enum(:each_line))
compare.each { |a,b|
  # ...
}

-Levin

···

On 9/3/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

food for thoughts...

Good heavens, no! Neither of those are thread safe. Criminy!

Why not something simple like so:

require 'enumerator'

def compare_by_line( expected, actual, message )
  expected_lines = expected.to_enum(:each_line)
  actual_lines = actual.to_enum(:each_line)
  expected_lines.zip(actual_lines).each_with_index do |pair, index|
    assert_equal( *pair, "#{message} on line #{index + 1}")
  end
end

compare_by_line( @html_reference, cal.to_html, "Calendar.to_html incorrect" )

There's no need for threads...

Jacob Fugal

Levin Alexander wrote:

···

On 9/3/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

food for thoughts...

require 'generator'

s1 = %w{this is the first line}.join("\n")
s2 = %w{this is the second line}.join("\n")

compare = SyncEnumerator.new(s1.to_enum(:each_line), s2.to_enum(:each_line))
compare.each { |a,b|
  # ...
}

-Levin

Yes, perfekt, that's what i was looking for (and implemented :slight_smile: )

the trivial solution is of course:
----------------------------------------------------------
s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

(0...s1.size).inject(1) do |l, i|
   puts "difference on line: #{l}" or break if s1[i] != s2[i]
   l = (s1[i] == 10) ? l+1 : l
end
----------------------------------------------------------

buts that no fun..

Simon

(Sorry, for those in non-tree-threading clients, that was in reply to
ruby-talk:154785)

Jacob Fugal

···

On 9/2/05, Jacob Fugal <lukfugl@gmail.com> wrote:

Good heavens, no! Neither of those are thread safe. Criminy!

Jacob Fugal wrote:

Good heavens, no! Neither of those are thread safe. Criminy!

Why not something simple like so:

require 'enumerator'

def compare_by_line( expected, actual, message )
  expected_lines = expected.to_enum(:each_line)
  actual_lines = actual.to_enum(:each_line)
  expected_lines.zip(actual_lines).each_with_index do |pair, index|
    assert_equal( *pair, "#{message} on line #{index + 1}")
  end
end

compare_by_line( @html_reference, cal.to_html, "Calendar.to_html incorrect" )

because zip converts each argument to an array, which isn't what you want if there are many lines.

There's no need for threads...

True SyncEnumerator does it with continuations but i wasn't ready for this mind bending stuff (yet). (and now i don't need to make knots in my brain, because Levin was so nice to tell us about SyncEnumerator)

But what exactly isn't thread safe? SizedQueue is.

Jacob Fugal

Threads aren't evil.

cheers

Simon

What's the second to last line for?

  l = (s1[i] == 10) ? l+1 : l

That will always return false (since s1[i] is always a string, and
string/integer comparison is always false), so your line count will
always be 1.

Jacob Fugal

···

On 9/2/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

s1 = ['this', 'is', 'a', 'test'].join("\n")
s2 = ['this', 'is', 'the', 'test'].join("\n")

(0...s1.size).inject(1) do |l, i|
   puts "difference on line: #{l}" or break if s1[i] != s2[i]
   l = (s1[i] == 10) ? l+1 : l
end

Jacob Fugal wrote:
> Why not something simple like so:
> ... <snip> ...

because zip converts each argument to an array, which isn't what you
want if there are many lines.

Ah, yes. I'd probably do it with SyncEnumerator rather than zip as
well, then[1]. I just used zip since its what you started with and you
didn't specify performance as an issue.

True SyncEnumerator does it with continuations but i wasn't ready for
this mind bending stuff (yet). (and now i don't need to make knots in my
brain, because Levin was so nice to tell us about SyncEnumerator)

Very true. Thanks, Levin.

But what exactly isn't thread safe? SizedQueue is.

Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Assuming I'm not wasting time by not looking at the SizedQueue
documentation, the problem is that there's no guarantee that the main
thread won't run several pops before returning control to the thread
that's pushing into the queue.

Threads aren't evil.

No, but naive non-thread-safe code is. Well, it's broken, at least...
not necessarily *evil*. :slight_smile:

Jacob Fugal

···

On 9/2/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

Jacob Fugal wrote:

[...]
What's the second to last line for?

  l = (s1[i] == 10) ? l+1 : l

That will always return false (since s1[i] is always a string, and
string/integer comparison is always false), so your line count will
always be 1.

s1[i] is in fact never a string:

str[fixnum] => fixnum or nil
http://www.ruby-doc.org/core/classes/String.html#M001328

so it is for counting newline chars. It could be written shorter

(s1[i] == 10) ? l+1 : l

without the assignment.

Jacob Fugal

cheers

Simon

Again, I need to proofread my posts better. That should read "Only if"
instead of "Unless".

Jacob Fugal

···

On 9/2/05, Jacob Fugal <lukfugl@gmail.com> wrote:

On 9/2/05, Simon Kröger <SimonKroeger@gmx.de> wrote:
> But what exactly isn't thread safe? SizedQueue is.

Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Performance would probably be the wrong motivation to switch to
SyncEnumerator. On my machine it is about 100 times slower then
Enumerable#zip.

  sync-enum: 1.676
  zip-enum: 0.018
  sizedqueue: 0.226

-Levin

compare.bm.rb (866 Bytes)

···

On 9/2/05, Jacob Fugal <lukfugl@gmail.com> wrote:

Ah, yes. I'd probably do it with SyncEnumerator rather than zip as
well, then[1]. I just used zip since its what you started with and you
didn't specify performance as an issue.

Ah, sorry, I missed the fact that you were iterating over characters
instead of lines. 10 == ?\n. Gotcha.

But isn't that different than the original functionality? He
originally wanted a warning (specifically assert_equal as part of
Test::Unit) for each line that differed. Your code will only match the
first such case then break out of the loop.

Jacob Fugal

···

On 9/2/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

Jacob Fugal wrote:

> [...]
> What's the second to last line for?
>
> l = (s1[i] == 10) ? l+1 : l
>
> That will always return false (since s1[i] is always a string, and
> string/integer comparison is always false), so your line count will
> always be 1.

s1[i] is in fact never a string:

Jacob Fugal wrote:

But what exactly isn't thread safe? SizedQueue is.

Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Again, I need to proofread my posts better. That should read "Only if"
instead of "Unless".

Jacob Fugal

It is a _sized_ queue wich i initailized to be of size 1, so there is only 1 item max in the queue at any given time. This comes from a lib called 'thread' so i would assume that it is in fact "built with threads in mind". It blocks if there is an item in it and you want to write and it block if there is no such item and you want to read.

cheers

Simon

···

On 9/2/05, Jacob Fugal <lukfugl@gmail.com> wrote:

On 9/2/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

Yeah, but that performance is probably linear (again, I'm shooting
outperform #zip. And even before it takes the lead in execution speed,
it definitely has an advantage in memory, which I was more concerned
about. SizedQueue however seems to be a much better choice than
SyncEnumerable for those situations.

Jacob Fugal.

···

On 9/2/05, Levin Alexander <levin.alexander@gmail.com> wrote:

Performance would probably be the wrong motivation to switch to
SyncEnumerator. On my machine it is about 100 times slower then
Enumerable#zip.

  sync-enum: 1.676
  zip-enum: 0.018
  sizedqueue: 0.226

from my hip here). So in the extreme cases, SyncEnumerable would