Great. Run it for us and let us know how we do.
James Edward Gray II
···
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Great. Run it for us and let us know how we do.
James Edward Gray II
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Ok, here's the result of mine:
mark@server1:~/rubyquiz/139$ time ./ip2country.rb 195.135.211.255
UKreal 0m0.004s
user 0m0.004s
sys 0m0.000sHere's my code:
#!/usr/bin/ruby
ARGV.each { |ip|
f = ip.split(/\./).join "/"
puts File.open(f).readlines[0] rescue puts "Unknown"
}I think it's pretty obvious what the preparation step was. Of course,
the tradeoff for this speed is a MASSIVE waste of disk resources, but
that was unlimited in this contest, was it not?
LOL! Nice...
Pretty clever.
I bet with the right prep, this could even be a pretty viable approach. Instead of building a file for each address you could create a directory structure for the hexadecimal representations of each piece of the address. The final layer could be handled as you have here or with a search through a much smaller file.
Indeed, I was thinking last night about preprocessing the data
into an on-disk hash table. I was thinking about a flat file
at the time, but one could use a subdirectory technique like the
above... using hex components of the hash value to index through
a couple subdirs to reach a leaf file containing one or more
records.
Another thing I'd like to try but won't have time for, is to
use ruby-mmap somehow. (Preprocess the data into a flat binary
file, then use ruby-mmap when performing the binary search.)
Anyway thanks for the fun quiz.
Regards,
Bill
From: "James Edward Gray II" <james@grayproductions.net>
On Sep 19, 2007, at 10:15 AM, Mark Thomas wrote:
Wow, I'm impressed. Can't wait to see that code!
James Edward Gray II
On Sep 14, 2007, at 2:35 PM, Simon Kröger wrote:
James Edward Gray II wrote:
On Sep 14, 2007, at 2:20 PM, Simon Kröger wrote:
Ruby Quiz wrote:
[...]
$ time ruby ip_to_country.rb 68.97.89.187
USreal 0m0.314s
user 0m0.259s
sys 0m0.053sIs an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)My script does need and initialization run, yes. I don't see any harm
in paying a one time penalty to set things up right.Is it motivating or a spoiler to post timings?
Motivating, definitely.
James Edward Gray II
Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.----------------------------------------------------------------
$ ruby -v
ruby 1.8.4 (2005-12-24) [i386-cygwin]$ time ruby quiz139.rb 68.97.89.187
USreal 0m0.047s
user 0m0.030s
sys 0m0.030s$ time ruby quiz139.rb 84.191.4.10
DEreal 0m0.046s
user 0m0.046s
sys 0m0.015s
----------------------------------------------------------------
I think that the timing of the scripts are not a good index. It all depends on what hardware/os you are running it on.
If we want to use speed as an index we should probably have J.E. compare them all on the same machine.
Maybe we could also write a ruby script that runs all the entry scripts and time them, and that could be another ruby quiz which will also be voted on speed and then we could write a ruby script to time those entries an then we could .... Just ignore this paragraph
Diego Scataglini
On Sep 14, 2007, at 3:35 PM, Simon Kröger <SimonKroeger@gmx.de> wrote:
James Edward Gray II wrote:
On Sep 14, 2007, at 2:20 PM, Simon Kröger wrote:
Ruby Quiz wrote:
[...]
$ time ruby ip_to_country.rb 68.97.89.187
USreal 0m0.314s
user 0m0.259s
sys 0m0.053sIs an 'initialisation run' allowed to massage the data?
(we should at least split the benchmarks to keep it fair)My script does need and initialization run, yes. I don't see any harm
in paying a one time penalty to set things up right.Is it motivating or a spoiler to post timings?
Motivating, definitely.
James Edward Gray II
Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.----------------------------------------------------------------
$ ruby -v
ruby 1.8.4 (2005-12-24) [i386-cygwin]$ time ruby quiz139.rb 68.97.89.187
USreal 0m0.047s
user 0m0.030s
sys 0m0.030s$ time ruby quiz139.rb 84.191.4.10
DEreal 0m0.046s
user 0m0.046s
sys 0m0.015s
----------------------------------------------------------------This is on a Pentium M 2.13GHz Laptop with 2GB RAM and rather slow HD.
cheers
Simon
Ok, my script does not need any initialization, it uses the file
IpToCountry.csv exactly as downloaded.
We probably did something similar. Mine also works on the
unmodified IpToCountry.csv file.
$ time ruby 139_ip_to_country.rb 67.19.248.74 70.87.101.66 205.234.109.18 217.146.186.221 62.75.166.87
US
GB
DE
real 0m0.122s
user 0m0.015s
sys 0m0.000s
(ruby 1.8.4 (2005-12-24) [i386-mswin32], timed from cygwin bash shell,
2GHz athlon64, winxp.)
I don't think the timings are very accurate on this system. It
didn't change much whether I looked up one IP or five.
. . . Looking up 80 IPs on one command line resulted in:
real 0m0.242s
user 0m0.015s
sys 0m0.016s
Regards,
Bill
From: "Simon Kröger" <SimonKroeger@gmx.de>
"Bill Kelly" <billk@cts.com> wrote in message news:00b301c7f897$a8c9dc20$6442a8c0@musicbox...
From: "Eugene Kalenkovich" <rubify@softover.com>
BTW, all solutions already submitted will lie for subnets 1,2 and 5
Most (but not all) will break on out of bounds submissions (256.256.256.256 or 0.0.0.-1, latter if comments are stripped out)Hi, could you clarify what is meant by lying about subnets
1, 2, and 5?Check what ccountry is 5.1.1.1. If you get any valid answer - this answer is a lie
Ah, OK. I get:
ruby 139_ip_to_country.rb 0.1.1.1 1.1.1.1 2.1.1.1 3.1.1.1 4.1.1.1 5.1.1.1
ZZ
(1.1.1.1 not found)
(2.1.1.1 not found)
US
(5.1.1.1 not found)
Regards,
Bill
From: "Eugene Kalenkovich" <rubify@softover.com>
$ ruby quiz139.rb 0.1.1.1 1.1.1.1 2.1.1.1 3.1.1.1 4.1.1.1 5.1.1.1
0.1.1.1 ZZ
1.1.1.1 ??
2.1.1.1 ??
3.1.1.1 US
4.1.1.1 US
5.1.1.1 ??
:}
gegroet,
Erik V.
James Edward Gray II schrieb:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Great. Run it for us and let us know how we do.
Here are the results of the supplied solutions so far, and it looks like my solution can take the 100k-performance victory
First Table: Compilation (Table Packing)
real user sys
Adam[*] 0.005 0.002 0.003
Luis 0.655 0.648 0.007
James[**] 21.089 18.142 0.051
Jesse 1.314 1.295 0.020
Matthias 0.718 0.711 0.008
[*]: Adam does not perform a real compression but he builds two boundaries to search within the original .csv he subsequently uses.
[**]: Upon rebuild, James fetches the .csv sources from the web making his solution look slow. This output highly depends on your--actually my--ISP speed.
Second Table: Run (100_000 Addresses)
real user sys
Adam 24.943 22.993 1.951
Bill 35.080 33.029 2.051
Luis 16.149 13.706 2.444
Eugene[*] 52.307 48.689 3.620
Eugene 65.790 61.984 3.805
James 14.803 12.449 2.356
Jesse 14.016 12.343 1.673
Jesus_a[**]
Jesus_b[**]
Kevin[***]
Matt_file 6.192 5.332 0.859
Matt_str 3.704 3.699 0.005
Simon 69.417 64.679 4.706
Justin 56.639 53.292 3.345
steve 63.659 54.355 9.294
[*]: Eugene already implements a random generator. But to make things fair, I changed his implementation to read the same values from $stdin as all the other implementations. The "Star" version is using his own random generator and runs outside competition, the starless version is my modified one.
[**]: O Jesus :), I can't make your FasterCSV version (a) run, and in the later version you sent your direct parsing breaks when it comes to detecting the commented lines in the first part of the file. I couldn't manage to make it run, sorry.
[***]: Although I managed to write the missing SQL insertion script and to even add separate indexes for the address limits, Kevin's SQLite3 version took simply too long. I estimated a run time of over an hour. I am willing to replay the test if someone tells me how to speed up things with SQLite3 to make it competitive.
Note that I slightly changed all implementations to contain a loop that iterates on $stdin.each instead of ARGV or using just ARGV[0]. For the test the script was run only once and was supplied with all addresses in one run. The test set consisted of 100_000 freshly generated random IP addresses written to a file and supplied using the following syntax:
$ (time ruby IpToCountry.rb <IP100k > /dev/null) 2>100k.time
I didn't check the output of the scripts, although I checked one address upfront. This was mainly because all scripts have a different output format. My tests were just for measuring the performance.
Just for Info:
$ uname -a
Linux sabayon2me 2.6.22-sabayon #1 SMP Mon Sep 3 00:33:06 UTC 2007 x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz GenuineIntel GNU/Linux
$ ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [x86_64-linux]
$ cat /etc/sabayon-release
Sabayon Linux x86-64 3.4
- Matthias
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Wow, I'm impressed. Can't wait to see that code!
Thanks.
Because startup time startet to take over the benchmark:
----------------------------------------------------------------
$ time ruby quiz139.rb 68.97.89.187 84.191.4.10 80.79.64.128 210.185.128.123
202.10.4.222 192.189.119.1
US
DE
RU
JP
AU
EU
real 0m0.078s
user 0m0.046s
sys 0m0.031s
----------------------------------------------------------------
and by the way: thanks for telling me such a database exists!
cheers
Simon
I think that the timing of the scripts are not a good index. It all depends on what hardware/os you are running it on.
If we want to use speed as an index we should probably have J.E. compare them all on the same machine.
I view it that we are getting a rough idea of speeds, not exact counts. Both scripts timed so far seem to be able to answer the question in under a second on semi-current hardware. Good enough for me.
Maybe we could also write a ruby script that runs all the entry scripts and time them, and that could be another ruby quiz which will also be voted on speed and then we could write a ruby script to time those entries an then we could .... Just ignore this paragraph
Thank you for volunteering...
James Edward Gray II
On Sep 14, 2007, at 4:58 PM, diego scataglini wrote:
Well, we weren't all broken:
$ ruby ip_to_country.rb 5.1.1.1
Unknown
James Edward Gray II
On Sep 16, 2007, at 7:21 PM, Bill Kelly wrote:
From: "Eugene Kalenkovich" <rubify@softover.com>
"Bill Kelly" <billk@cts.com> wrote in message news:00b301c7f897$a8c9dc20$6442a8c0@musicbox...
From: "Eugene Kalenkovich" <rubify@softover.com>
BTW, all solutions already submitted will lie for subnets 1,2 and 5
Most (but not all) will break on out of bounds submissions (256.256.256.256 or 0.0.0.-1, latter if comments are stripped out)Hi, could you clarify what is meant by lying about subnets
1, 2, and 5?Check what ccountry is 5.1.1.1. If you get any valid answer - this answer is a lie
Ah, OK. I get:
ruby 139_ip_to_country.rb 0.1.1.1 1.1.1.1 2.1.1.1 3.1.1.1 4.1.1.1 5.1.1.1
ZZ
(1.1.1.1 not found)
(2.1.1.1 not found)
US
(5.1.1.1 not found)
James Edward Gray II schrieb:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Great. Run it for us and let us know how we do.
Here are the results of the supplied solutions so far, and it looks like my solution can take the 100k-performance victory
Thanks for putting that together! Fun to see the different
times.
If I've understood correctly, it looks like my solution seems
to be the fastest (so far) of those that operate on the
unmodified .csv file?
I wasn't expecting that, at all...
I would have bet Simon's would be faster. Strange!
Regards,
Bill
From: "Matthias Wächter" <matthias@waechter.wiz.at>
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Thank you very much for putting this together and wow, your code is lightening quick.
James Edward Gray II
On Sep 18, 2007, at 6:23 PM, Matthias Wächter wrote:
James Edward Gray II schrieb:
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Great. Run it for us and let us know how we do.
Here are the results of the supplied solutions so far, and it looks like my solution can take the 100k-performance victory
Hi,
Yes, I only tested with my version of the file, from which I manually
removed the comments. I don't think I'll have time to fix that, at
least this week. Anyway, I find strange the FasterCSV version doesn't
work, because it delegates the parsing of the file to that gem, and
the rest is pretty simple. On the other hand I don't expect the first
version to perform anywhere near the other solutions, so it's not so
important :-).
Jesus.
On 9/19/07, Matthias Wächter <matthias@waechter.wiz.at> wrote:
James Edward Gray II schrieb:
> On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
>> Anyway -- i'd like to see a 100000 lookups comparison *hehe*
> Great. Run it for us and let us know how we do.[**]: O Jesus :), I can't make your FasterCSV version (a) run, and in
the later version you sent your direct parsing breaks when it comes to
detecting the commented lines in the first part of the file. I couldn't
manage to make it run, sorry.
"James Edward Gray II" <james@grayproductions.net> wrote in message
news:4A414BD9-8B56-4897-A13E-
Well, we weren't all broken:
I've sent my comment and refreshed new headers - and yes, your solution came
in
--EK
>
> Here are the results of the supplied solutions so far, and it looks like my solution can take the 100k-performance victoryIf I've understood correctly, it looks like my solution seems
to be the fastest (so far) of those that operate on the
unmodified .csv file?
It depends what you mean by unmodified - my algorithm runs off the
original file, the only "modification" I am doing in the setup stage
is searching for and saving the byte offset of the first and last
records. It looks like l could have done that every time my script
was run and only added 5 ms.
I would have bet Simon's would be faster. Strange!
I thought block file reads would be faster too, that was the next
thing I was planning to try. Maybe it's the regexp that slowed it
down.
-Adam
On 9/18/07, Bill Kelly <billk@cts.com> wrote:
>> On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Comments are not a part of the CSV specification, so FasterCSV doesn't address them. I would like to add ignore patterns at some point though.
James Edward Gray II
On Sep 19, 2007, at 2:27 AM, Jesús Gabriel y Galán wrote:
On 9/19/07, Matthias Wächter <matthias@waechter.wiz.at> wrote:
James Edward Gray II schrieb:
On Sep 18, 2007, at 4:00 AM, Matthias Wächter wrote:
Anyway -- i'd like to see a 100000 lookups comparison *hehe*
Great. Run it for us and let us know how we do.
[**]: O Jesus :), I can't make your FasterCSV version (a) run, and in
the later version you sent your direct parsing breaks when it comes to
detecting the commented lines in the first part of the file. I couldn't
manage to make it run, sorry.Hi,
Yes, I only tested with my version of the file, from which I manually
removed the comments. I don't think I'll have time to fix that, at
least this week. Anyway, I find strange the FasterCSV version doesn't
work, because it delegates the parsing of the file to that gem, and
the rest is pretty simple.
>
> If I've understood correctly, it looks like my solution seems
> to be the fastest (so far) of those that operate on the
> unmodified .csv file?It depends what you mean by unmodified - my algorithm runs off the
original file, the only "modification" I am doing in the setup stage
is searching for and saving the byte offset of the first and last
records. It looks like l could have done that every time my script
was run and only added 5 ms.
Ah, I see. Cool.
Incidentally since the file format description indicates comment
lines may appear anywhere in the file, I allowed for that. However,
I doubt adding a loop to your gets/split logic to keep going until a
valid record was found would affect your time much at all.
Nice job
Regards,
Bill
From: "Adam Shelly" <adam.shelly@gmail.com>
On 9/18/07, Bill Kelly <billk@cts.com> wrote:
Adam Shelly wrote:
I would have bet Simon's would be faster. Strange!
I thought block file reads would be faster too, that was the next
thing I was planning to try. Maybe it's the regexp that slowed it
down.
Without looking at the other solutions in detail i think one of the problems
may be that my solution opens the file for each lookup - thats of course easy
to fix. I don't know if thats the problem or the overhead of creating ten
thousand IPAddr objects - i refuse to analyse this in depth because i don't
have a usecase for looking up that many locations in a single run.
(on the other hand i do understand how much fun it can be to optimize such a
problem to death - so go on if you like, i don't have the motivation - this time
cheers
Simon