I have a Ruby speed problem, maybe you have some suggestions:
given: text files with values, one per line, sorted, e.g.
10100
10234
10292
…
so:
arr1 = IO.readlines(file1)
arr2 = IO.readlines(file2)
arr1 consists of ~40000 lines/elements
arr2 size is ~10000
when I want to take the “set difference”, arr3 = arr1-arr2, meaning “take
all elements from arr1 which dont appear in arr2” this takes forever - I
don’t even know how long because I stopped early
So the built-in operator is too slow - taking advantage of my knowledge
of the sorting I handcoded a "loop { compare arr1[0] with arr2[0], either
puts or remove }"
This gets it down to 5-10s, much better, but still too slow - 1s would be
a nice target
As I’m still quite novice in Ruby (always having Hal’s book on my lap …)
how would you code a speedier solution?
(as for shell tools, I first tried the grep way, but it chokes on the size
with a “reg.exp. too large”)
given: text files with values, one per line, sorted, e.g.
10100
10234
10292
…
so:
arr1 = IO.readlines(file1)
arr2 = IO.readlines(file2)
arr1 consists of ~40000 lines/elements
arr2 size is ~10000
when I want to take the “set difference”, arr3 = arr1-arr2, meaning “take
all elements from arr1 which dont appear in arr2” this takes forever - I
don’t even know how long because I stopped early
The following runs in about .5s on my pokey old box:
You could also do it without hashes (though that solution is clearly the
simplest). This algorithm feels to me like the merge stage of a merge
sort, sorta (if you’re not familiar with merge sort, google for it):
i = 0
a3 =
a1.each { |a|
i = i + 1 while a2[i] < a
a3 << a if a2[i] > a
}
a3.each {|a| puts a}
···
On Sat, Jul 06, 2002 at 11:18:47PM +0900, Martin Pirker wrote:
I have a Ruby speed problem, maybe you have some suggestions:
given: text files with values, one per line, sorted, e.g.
10100
10234
10292
…
so:
arr1 = IO.readlines(file1)
arr2 = IO.readlines(file2)
arr1 consists of ~40000 lines/elements
arr2 size is ~10000
when I want to take the “set difference”, arr3 = arr1-arr2, meaning “take
all elements from arr1 which dont appear in arr2” this takes forever - I
don’t even know how long because I stopped early
So the built-in operator is too slow - taking advantage of my knowledge
of the sorting I handcoded a “loop { compare arr1[0] with arr2[0], either
puts or remove }”
This gets it down to 5-10s, much better, but still too slow - 1s would be
a nice target
As I’m still quite novice in Ruby (always having Hal’s book on my lap …)
how would you code a speedier solution?
On Saturday 06 July 2002 07:18 am, Martin Pirker wrote:
Hi…
I have a Ruby speed problem, maybe you have some suggestions:
given: text files with values, one per line, sorted, e.g.
10100
10234
10292
…
so:
arr1 = IO.readlines(file1)
arr2 = IO.readlines(file2)
arr1 consists of ~40000 lines/elements
arr2 size is ~10000
when I want to take the “set difference”, arr3 = arr1-arr2, meaning
“take all elements from arr1 which dont appear in arr2” this takes
forever - I don’t even know how long because I stopped early
when I want to take the “set difference”, arr3 = arr1-arr2, meaning “take
all elements from arr1 which dont appear in arr2” this takes forever - I
don’t even know how long because I stopped early
The following runs in about .5s on my pokey old box: