Benchmark for Scala, Ruby and Perl

Hello Rubyers,

May I show the result of my benchmark for perl5, ruby, and scala?
https://blog.cloudcache.net/benchmark-for-scala-ruby-and-perl/

Welcome you to give any suggestion to me for improving this.

Thank you.
Jon

Not perfect, but a quick edit of your ruby script:

  stopwords = {}
  File.open('stopwords.txt').each_line do |s|
    s.chomp!
    stopwords[s] = 1
  end

  count = Hash.new(0)
  File.open('words.txt').each_line do |s|
    s.chomp!
    count[s] += 1 unless stopwords[s]
  end

  count.sort_by{|_,c| -c}.take(20).each do |s|
    puts "#{s[0]} -> #{s[1]}"
  end

It may even be a little bit faster:

Calculating:
  org 0.170 (± 0.0%) i/s - 6.000 in 35.518822s
  new 0.214 (± 0.0%) i/s - 7.000 in 33.074879s
  new 0.215 (± 0.0%) i/s - 7.000 in 32.760206s
  org 0.175 (± 0.0%) i/s - 6.000 in 34.557145s

Comparison:
  new: 0.2 i/s
  new: 0.2 i/s - 1.01x (± 0.00) slower
  org: 0.2 i/s - 1.23x (± 0.00) slower
  org: 0.2 i/s - 1.26x (± 0.00) slower

https://dpaste.org/Sz1Z

···

On 1/15/22, Jon Smart <jon@smartown.nl> wrote:

May I show the result of my benchmark for perl5, ruby, and scala?
https://blog.cloudcache.net/benchmark-for-scala-ruby-and-perl/
Welcome you to give any suggestion to me for improving this.

On my machine your ruby script ran in

real 0m2.824s (this is an average ish figure from ten runs)
user 0m2.757s
sys 0m0.057s

However a slightly tweaked version ran in

real 0m2.597s (also a average figure from 10 runs)
user 0m2.430s
sys 0m0.146s

stopwords = File.open('stopwords.txt').read.split("\n")

count = Hash.new(0)
File.open('words.txt').read.split("\n") do |s|
  count[s] += 1
end

stopwords.each { |s| count.delete(s) }

z = count.sort_by {|k, v| -v}
z.take(20).each do |s| puts "#{s[0]} -> #{s[1]}" end

A thing of note that simply reading the files is around 46% of the total
time

stopwords = File.open('stopwords.txt').read.split("\n")
words = File.open('words.txt').read.split("\n")

real 0m1.220s
user 0m1.063s
sys 0m0.140s

For giggles I hacked up this is lua (v5.4.3). Being lua there is less by
way of "sugar" / "convenience" so the code is a lot less concise. But for
that you get improved performance

real 0m1.541s
user 0m1.515s
sys 0m0.022s

local stopwords = {}
local file1 = io.open("stopwords.txt", "r")
for line in file1:lines() do
  stopwords[line] = 1
end

local count = {}
local file2 = io.open("words.txt", "r")
for line in file2:lines() do
  if stopwords[line] == nil then
    if count[line] == nil then
      count[line] = 1
    else
      count[line] = count[line] + 1
    end
  end
end

local keys = {}

for key, _ in pairs(count) do
  table.insert(keys, key)
end

table.sort(keys, function(lhs, rhs) return count[lhs] > count[rhs] end)
for i=1,20 do
  print(keys[i], count[keys[i]])
end

As a further note just reading the files in Lua took

real 0m1.312s
user 0m1.291s
sys 0m0.017s

So again it suggest that your benchmarks are really testing the performance
of your storage medium / underlying C interface to read the files

Frank your code looks graceful, thanks.
There is a hacker who changed the Read to Mmap who improved the speed pretty pretty much faster because it decreases the system call from hundreds to just 2.

···

On 16.01.2022 00:34, Frank J. Cameron wrote:

On 1/15/22, Jon Smart <jon@smartown.nl> wrote:

May I show the result of my benchmark for perl5, ruby, and scala?
https://blog.cloudcache.net/benchmark-for-scala-ruby-and-perl/
Welcome you to give any suggestion to me for improving this.

Not perfect, but a quick edit of your ruby script:

  stopwords = {}
  File.open('stopwords.txt').each_line do |s|
    s.chomp!
    stopwords[s] = 1
  end

  count = Hash.new(0)
  File.open('words.txt').each_line do |s|
    s.chomp!
    count[s] += 1 unless stopwords[s]
  end

  count.sort_by{|_,c| -c}.take(20).each do |s|
    puts "#{s[0]} -> #{s[1]}"
  end

It may even be a little bit faster:

Calculating:
  org 0.170 (± 0.0%) i/s - 6.000 in 35.518822s
  new 0.214 (± 0.0%) i/s - 7.000 in 33.074879s
  new 0.215 (± 0.0%) i/s - 7.000 in 32.760206s
  org 0.175 (± 0.0%) i/s - 6.000 in 34.557145s

Comparison:
  new: 0.2 i/s
  new: 0.2 i/s - 1.01x (± 0.00) slower
  org: 0.2 i/s - 1.23x (± 0.00) slower
  org: 0.2 i/s - 1.26x (± 0.00) slower

https://dpaste.org/Sz1Z

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Peter both read_line and read use the same buffer IO in the language?
I run this benchmark using a sub data of my actual data. The latter has xxx millions of words as input. If reading the whole file into ram by one time this will run out of memory.

Thank you.

···

On 16.01.2022 03:15, Peter Hickman wrote:

On my machine your ruby script ran in

real 0m2.824s (this is an average ish figure from ten runs)
user 0m2.757s
sys 0m0.057s

However a slightly tweaked version ran in

real 0m2.597s (also a average figure from 10 runs)
user 0m2.430s
sys 0m0.146s

stopwords = File.open('stopwords.txt').read.split("\n")

count = Hash.new(0)
File.open('words.txt').read.split("\n") do |s|
  count[s] += 1
end

stopwords.each { |s| count.delete(s) }

z = count.sort_by {|k, v| -v}
z.take(20).each do |s| puts "#{s[0]} -> #{s[1]}" end

A thing of note that simply reading the files is around 46% of the
total time

stopwords = File.open('stopwords.txt').read.split("\n")
words = File.open('words.txt').read.split("\n")

real 0m1.220s
user 0m1.063s
sys 0m0.140s

For giggles I hacked up this is lua (v5.4.3). Being lua there is less
by way of "sugar" / "convenience" so the code is a lot less concise.
But for that you get improved performance

real 0m1.541s
user 0m1.515s
sys 0m0.022s

local stopwords = {}
local file1 = io.open("stopwords.txt", "r")
for line in file1:lines() do
  stopwords[line] = 1
end

local count = {}
local file2 = io.open("words.txt", "r")
for line in file2:lines() do
  if stopwords[line] == nil then
    if count[line] == nil then
      count[line] = 1
    else
      count[line] = count[line] + 1
    end
  end
end

local keys = {}

for key, _ in pairs(count) do
  table.insert(keys, key)
end

table.sort(keys, function(lhs, rhs) return count[lhs] > count[rhs]
end)
for i=1,20 do
  print(keys[i], count[keys[i]])
end

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

Peter both read_line and read use the same buffer IO in the language?

Not sure, I just did this for the data as was

As a calibration I ran your perl version on my machine and got

real 0m1.445s
user 0m1.417s
sys 0m0.022s

If we are looking at really big data then you would need to see how much
memory is being consumed by the process. Something that eats a lot of
memory will behave differently when it has enough ram v.s. when the system
is hitting swap

Here are some figures, I slept 60 seconds at the end so I could run "ps aux

grep ruby" (or perl or lua) in another window once it had done it's thing

USER PID %CPU %MEM VSZ RSS TT STAT STARTED
TIME COMMAND
peterhickman 89328 0.0 0.2 34845928 31456 s002 S+ 8:39pm
0:02.91 ruby script
peterhickman 89355 0.0 2.7 35439260 449224 s002 S+ 8:40pm
0:02.60 ruby script2
peterhickman 89398 0.0 0.1 34675524 23720 s002 S+ 8:43pm
0:01.42 perl perl
peterhickman 89541 100.0 0.1 34663240 16488 s002 R+ 8:56pm
0:03.42 lua fred.lua

"script" is your original ruby version, "script2" is mine (yup we are
eating the memory) and "perl" is your original perl version. "fred.lua" is
mine too

So my ruby script is faster(ish) but eats 10x more memory. Perl uses less
memory than your script and Lua does it best

What this would translate into your real data set is anyones guess :slight_smile:
Except that I'm putting money one my Ruby version being the worst

···

On Sat, 15 Jan 2022 at 19:54, Jon Smart <jon@smartown.nl> wrote:

For yet more fun, I took a somewhat optimized version of the original, that
still uses each_line, and compared it to a similar implementation in
Crystal.

I didn't use your version, Peter, because the use of File.read means that
you are pulling the whole file into RAM first, and has the OP suggests,
that could be problematic on some systems, with some data sets.

On my system, running Ruby 3.1.0, I got it down to about 1.8 seconds per
iteration with the following code:

  stopwords =
  File.open('stopwords.txt').each_line do |s|
    stopwords << s
  end

  count = Hash.new(0)
  File.open('words.txt').each_line do |s|
    count[s] += 1
  end

  stopwords.each {|s| count.delete(s)}

  count.sort_by{|_,c| -c}.take(20).each do |s|
    puts "#{s[0].chomp} -> #{s[1]}"
  end

Using YJIT shaves a very tiny amount off of that. It tended to be about 5
hundredths of a second, even when I built it out in a benchmarkable format,
and repeated the count and sort process a couple dozen times. MJIT was
slower for a single iteration by about 50%, but was comparable for 20
iterations.

I then ran almost the same code under Crystal:

stopwords = of String
File.open("stopwords.txt").each_line do |s|
  stopwords << s
end

count = Hash(String, Int32).new(0)
File.open("words.txt").each_line do |w|
  count[w] += 1
end

stopwords.each {|s| count.delete(s)}

count.to_a.sort_by {|_,c| -c}[0..19].each do |s|
  puts "#{s[0].chomp} -> #{s[1]}"
end

It consistently runs in about 0.76 to 0.77 seconds.

RAM usage was interesting.

Name VSZ RSS Notes
ruby count.rb 97616 39536 Ruby 3.1.0 without a JIT
ruby --yjit count.rb 360236 302028 Ruby 3.1.0 with YJIT
ruby --mjit count.rb 171464 39780 Ruby 3.1.0 with MJIT
count 162276 19884 Crystal 1.3.1

This task is heavily influenced by the speed of the underlying IO
subsystem, so any gains from YJIT were quite minimal in the face of that,
at the expense of a lot more RAM use.

Kirk Haines

···

On Sat, Jan 15, 2022 at 2:12 PM Peter Hickman < peterhickman386@googlemail.com> wrote:

On Sat, 15 Jan 2022 at 19:54, Jon Smart <jon@smartown.nl> wrote:

Peter both read_line and read use the same buffer IO in the language?

Not sure, I just did this for the data as was

As a calibration I ran your perl version on my machine and got

real 0m1.445s
user 0m1.417s
sys 0m0.022s

If we are looking at really big data then you would need to see how much
memory is being consumed by the process. Something that eats a lot of
memory will behave differently when it has enough ram v.s. when the system
is hitting swap

Here are some figures, I slept 60 seconds at the end so I could run "ps
aux | grep ruby" (or perl or lua) in another window once it had done it's
thing

USER PID %CPU %MEM VSZ RSS TT STAT STARTED
TIME COMMAND
peterhickman 89328 0.0 0.2 34845928 31456 s002 S+ 8:39pm
0:02.91 ruby script
peterhickman 89355 0.0 2.7 35439260 449224 s002 S+ 8:40pm
0:02.60 ruby script2
peterhickman 89398 0.0 0.1 34675524 23720 s002 S+ 8:43pm
0:01.42 perl perl
peterhickman 89541 100.0 0.1 34663240 16488 s002 R+ 8:56pm
0:03.42 lua fred.lua

"script" is your original ruby version, "script2" is mine (yup we are
eating the memory) and "perl" is your original perl version. "fred.lua" is
mine too

So my ruby script is faster(ish) but eats 10x more memory. Perl uses less
memory than your script and Lua does it best

What this would translate into your real data set is anyones guess :slight_smile:
Except that I'm putting money one my Ruby version being the worst

I tried this on my system, version 1.3.1, and it came in with

real 0m4.474s
user 0m4.209s
sys 0m0.059s

So pretty much the worst of the bunch time wise. Memory wise it was

peterhickman 13455 0.0 0.1 34447596 21212 s002 S+ 12:10am
0:04.26 ./ccc

which puts it between perl and lua for memory usage

Could you post the times for Jon's initial ruby script? It will help me
calibrate your machine against mine

···

On Sat, 15 Jan 2022 at 22:15, Kirk Haines <wyhaines@gmail.com> wrote:

I then ran almost the same code under Crystal:

stopwords = of String
File.open("stopwords.txt").each_line do |s|
  stopwords << s
end

count = Hash(String, Int32).new(0)
File.open("words.txt").each_line do |w|
  count[w] += 1
end

stopwords.each {|s| count.delete(s)}

count.to_a.sort_by {|_,c| -c}[0..19].each do |s|
  puts "#{s[0].chomp} -> #{s[1]}"
end

It consistently runs in about 0.76 to 0.77 seconds.

I then ran almost the same code under Crystal:

stopwords = of String
File.open("stopwords.txt").each_line do |s|
  stopwords << s
end

count = Hash(String, Int32).new(0)
File.open("words.txt").each_line do |w|
  count[w] += 1
end

stopwords.each {|s| count.delete(s)}

count.to_a.sort_by {|_,c| -c}[0..19].each do |s|
  puts "#{s[0].chomp} -> #{s[1]}"
end

It consistently runs in about 0.76 to 0.77 seconds.

I tried this on my system, version 1.3.1, and it came in with

real 0m4.474s
user 0m4.209s
sys 0m0.059s

So pretty much the worst of the bunch time wise. Memory wise it was

peterhickman 13455 0.0 0.1 34447596 21212 s002 S+ 12:10am
0:04.26 ./ccc

which puts it between perl and lua for memory usage

Could you post the times for Jon's initial ruby script? It will help me
calibrate your machine against mine

Compile it with - - release.

Kirk

···

On Sat, Jan 15, 2022, 5:17 PM Peter Hickman <peterhickman386@googlemail.com> wrote:

On Sat, 15 Jan 2022 at 22:15, Kirk Haines <wyhaines@gmail.com> wrote:

I then ran almost the same code under Crystal:

stopwords = of String
File.open("stopwords.txt").each_line do |s|
  stopwords << s
end

count = Hash(String, Int32).new(0)
File.open("words.txt").each_line do |w|
  count[w] += 1
end

stopwords.each {|s| count.delete(s)}

count.to_a.sort_by {|_,c| -c}[0..19].each do |s|
  puts "#{s[0].chomp} -> #{s[1]}"
end

It consistently runs in about 0.76 to 0.77 seconds.

I tried this on my system, version 1.3.1, and it came in with

real 0m4.474s
user 0m4.209s
sys 0m0.059s

So pretty much the worst of the bunch time wise. Memory wise it was

peterhickman 13455 0.0 0.1 34447596 21212 s002 S+ 12:10am
0:04.26 ./ccc

which puts it between perl and lua for memory usage

Could you post the times for Jon's initial ruby script? It will help me
calibrate your machine against mine

Now that I have a little more time to write, and can make complete
sentences, given the timings that you reported for Ruby, I am pretty sure
that you are compiling the crystal code in development mode vs release
mode.

crystal build - - release count.cr

You will probably find that it runs in around a second on your machine.

Kirk Haines

···

On Sat, Jan 15, 2022, 5:17 PM Peter Hickman <peterhickman386@googlemail.com> wrote:

On Sat, 15 Jan 2022 at 22:15, Kirk Haines <wyhaines@gmail.com> wrote:

Oh yeah, that was it. Much faster now

real 0m1.003s
user 0m0.748s
sys 0m0.062s

Same memory usage. I had just downloaded crystal and run it, should have
thought about it more

Thanks

···

On Sun, 16 Jan 2022 at 05:29, Kirk Haines <wyhaines@gmail.com> wrote:

On Sat, Jan 15, 2022, 5:17 PM Peter Hickman < > peterhickman386@googlemail.com> wrote:

On Sat, 15 Jan 2022 at 22:15, Kirk Haines <wyhaines@gmail.com> wrote:

I then ran almost the same code under Crystal:

stopwords = of String
File.open("stopwords.txt").each_line do |s|
  stopwords << s
end

count = Hash(String, Int32).new(0)
File.open("words.txt").each_line do |w|
  count[w] += 1
end

stopwords.each {|s| count.delete(s)}

count.to_a.sort_by {|_,c| -c}[0..19].each do |s|
  puts "#{s[0].chomp} -> #{s[1]}"
end

It consistently runs in about 0.76 to 0.77 seconds.

I tried this on my system, version 1.3.1, and it came in with

real 0m4.474s
user 0m4.209s
sys 0m0.059s

So pretty much the worst of the bunch time wise. Memory wise it was

peterhickman 13455 0.0 0.1 34447596 21212 s002 S+ 12:10am
0:04.26 ./ccc

which puts it between perl and lua for memory usage

Could you post the times for Jon's initial ruby script? It will help me
calibrate your machine against mine

Now that I have a little more time to write, and can make complete
sentences, given the timings that you reported for Ruby, I am pretty sure
that you are compiling the crystal code in development mode vs release
mode.

crystal build - - release count.cr

You will probably find that it runs in around a second on your machine.

Kirk Haines

Unsubscribe: <mailto:ruby-talk-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk&gt;

I have to agree with Peter here: it is not very clear what,
*precisely* you are trying to benchmark.

For example, in your benchmark, you include the time taken to print
the result to the console. It is well known that the console is slow,
and this is a property of the *console*, not the program you are
benchmarking or the language implementation you are using. You also
include the *startup time* of the implementation in your benchmark,
which e.g. for the JVM can be significant. You include in your
benchmark the time spent reading the files from the harddisk and
parsing them – this is at least partially dependent on the performance
of your harddisk, your filesystem, your Operating System, and whether
or not the files are in the cache or not, none of which has anything
*specifically* to to with the language or the program.

However, in your follow-up blog post, you mention that your *actual*
application is about *streaming data*.

This means, you are actually including lots of irrelevant operations
in your benchmark:

* The speed of the terminal is irrelevant, since in your real
application you are not printing to the console.
* The startup time of the VM is irrelevant, because it will only be
started once and then constantly process streaming data.
* The time for reading and parsing the stopword list is irrelevant,
since it will only be done once at application startup.
* The filesystem performance is irrelevant, because the data will not
come from the filesystem but some form of message queue.

The *real* takeaway here is that *benchmarking is hard*. It is no
coincidence that the benchmarks which are used in the industry are
written by benchmark engineers who *only* write benchmarks 24/7 and do
nothing else. And even *they* sometimes get it wrong: I can't remember
the details, but there was a famous example of a SPEC benchmark that
was *supposed* to test database performance but *actually* tested
memory allocator performance of the benchmark runner!

If I remember correctly, the problem was that the actual database
operations the benchmark performed were so trivial that the most
expensive operation was not the database query but allocating the
result set object in the test harness. So, the score in this
particular "database benchmark" had absolutely nothing to do with the
database and was purely a measure of how fast the computer that the
test harness was running on could allocate memory.

Another example is an old benchmark from the Computer Language
Benchmark Game, where Haskell beat C, C++, and even hand-optimized
assembly by a *massive* margin. The problem was that the benchmark was
about sorting a gigantic array, but the benchmark never did anything
with the sorted array. It just sorted the array, and then ignored the
result. The Haskell compiler was clever enough to recognize that the
sorted array was never used, so it optimized away the sorting, and
since now the unsorted input array was not used as well, it also
optimized away the array itself, and lastly, it optimized away the
code that reads the array from the input. The result was almost like
`void main() { exit 0; }`. Whereas the C and C++ language
specifications do not allow for such optimizations, and the
hand-optimized assembly did the sort and then ignored the result.

All this is to say that creating *representative* benchmarks with
*statistically significant and robust results* is very hard. I
certainly would never dare to try it myself.

If I were you, as a first step, I would carefully decide what,
*exactly*, I want to measure. In your case, I think it does not make
sense to include VM startup, printing, or building the stopword set.
It might make sense to include reading the wordlist from disk but it
might also make sense to exclude it – that really depends on your use
case.

It might make sense to choose different data structures or different
algorithms altogether. For example, you are only looking at the top 20
elements. Currently, you are sorting the entire thing in order to find
the top 20 elements, which is O(n * log n). But you don't actually
need to sort at all to find the top 20 elements and can do it in O(n).

Cheers.

···

Peter Hickman <peterhickman386@googlemail.com> wrote:

As a further note just reading the files in Lua took

real 0m1.312s
user 0m1.291s
sys 0m0.017s

So again it suggest that your benchmarks are really testing the performance of your storage medium / underlying C interface to read the files