Yarv and dbi

jm11 · 2 February 2005 03:12

Anyone out there tried dbi with yarv

dbi 0.0.23 seems to work ok ruby 1.9, however when the test.rb script is modified as follows, by adding the required line,

···

###########################################################
$prog =<<'__EOP__'
require 'dbi'

__END__
1
__EOP__
###########################################################

this gives,

$../bin/ruby test-dbi.rb
YARVCore 0.1.0 rev: 120 (2005-01-09)
[direct threaded code] [optimize basic operation] [optimize regexp match] [stack caching] [inline method cache]
== disasm: <ISeq:main@test-dbi.rb>======================================
local scope table (size: 1, argc: 0)

0000 putself_SC_xx_ax ( 11)
0001 putstring_SC_ax_ab"dbi"
0003 send_opopt__WC___WC__Qfalse_0__WC__SC_ab_ax:require, 1, <ic>
0007 end_SC_ax_ax 2
-----------------------------------------------------------------------------
in `initialize': /home/jeffm/yarv//lib/ruby/site_ruby/1.9/dbi/dbi.rb:1180: BUG: unknown node: NODE_BLOCK_PASS (SyntaxError)
         from in `initialize'
         from in `require'
         from /home/jeffm/yarv//lib/ruby/site_ruby/1.9/dbi.rb:1
         from in `require'
         from test-dbi.rb:11

any ideas?

Jeff.

Charles_Mills1 · 2 February 2005 05:55

jm wrote:

Anyone out there tried dbi with yarv

dbi 0.0.23 seems to work ok ruby 1.9, however when the test.rb script

is modified as follows, by adding the required line,

###########################################################
$prog =<<'__EOP__'
require 'dbi'

(...)
I don't think YARV supports Kernel.require yet.

-Charlie

SASADA_Koichi · 2 February 2005 07:02

jm <jeffm@ghostgun.com> wrote :
[ yarv and dbi ]
at Wed, 2 Feb 2005 12:12:08 +0900

Hi,

Anyone out there tried dbi with yarv

dbi 0.0.23 seems to work ok ruby 1.9, however when the test.rb script
is modified as follows, by adding the required line,

...

any ideas?

YARV supports "require", but no complete ruby specs.

(e.g. method(&Proc.new{ ... }) doesn't work on yarv)

I'll implement as soon as possible.

Thank you.

···

--
// SASADA Koichi at atdot dot net
//

jm11 · 2 February 2005 06:53

Thanks. I've now verified that it doesn't work with require 'abbrev' or require 'ipaddr', but each time it gives a slightly different error.

jeff.

···

On 02/02/2005, at 4:55 PM, Charles Mills wrote:

jm wrote:

Anyone out there tried dbi with yarv

dbi 0.0.23 seems to work ok ruby 1.9, however when the test.rb script

is modified as follows, by adding the required line,

###########################################################
$prog =<<'__EOP__'
require 'dbi'

(...)
I don't think YARV supports Kernel.require yet.

-Charlie

Charles_Mills1 · 2 February 2005 21:35

SASADA Koichi wrote:

jm <jeffm@ghostgun.com> wrote :
[ yarv and dbi ]
at Wed, 2 Feb 2005 12:12:08 +0900

Hi,

> Anyone out there tried dbi with yarv
>
> dbi 0.0.23 seems to work ok ruby 1.9, however when the test.rb

script

> is modified as follows, by adding the required line,
..
> any ideas?

YARV supports "require", but no complete ruby specs.

(e.g. method(&Proc.new{ ... }) doesn't work on yarv)

I'll implement as soon as possible.

Sounds like YARV is progressing quickly. Very impressive work.
-Charlie

jm11 · 2 February 2005 22:08

YARV supports "require", but no complete ruby specs.

(e.g. method(&Proc.new{ ... }) doesn't work on yarv)

I'll implement as soon as possible.

I'll be a willing test subject.

Sounds like YARV is progressing quickly. Very impressive work.

Agreed. By the looks of it it will put an end to those ruby are slow comments (mine anyway).

Jeff.

···

On 03/02/2005, at 8:35 AM, Charles Mills wrote:

Navindra_Umanee · 3 February 2005 00:32

In what context did you find Ruby to be slow? A website of yours? Be
interested to hear your experiences.

Thanks,
Navin.

···

jm <jeffm@ghostgun.com> wrote:

Agreed. By the looks of it it will put an end to those ruby are slow
comments (mine anyway).

jm11 · 3 February 2005 02:18

Processing flow data from a router. The script caches user information from a database then processes a 25-30MB flow file captured from a router using flow-tools. This takes about 5 minutes on a 2.4GHz pentium 4 unloaded by any other process running at 98% utilisation continuously. To put this into context each flow file is only 15 minutes worth of data and the current perl version does it in a bit over 2 minutes. This perl version is showing it's lack of design in a variety of ways including the nightmare of trying to add features it was never designed to support. So this was a good opportunity to rewrite it in ruby to make it more maintainable, etc.

While 5 minutes in within the time constraint that is on an unloaded machine and the machine it's destined for has other processes sharing the CPU.

Jeff.

···

On 03/02/2005, at 11:32 AM, Navindra Umanee wrote:

jm <jeffm@ghostgun.com> wrote:

Agreed. By the looks of it it will put an end to those ruby are slow
comments (mine anyway).

In what context did you find Ruby to be slow? A website of yours? Be
interested to hear your experiences.

Thanks,
Navin.

Austin_Ziegler5 · 3 February 2005 15:41

Could your script be doing things that could be improved in performance?

-austin

···

On Thu, 3 Feb 2005 11:18:18 +0900, jm <jeffm@ghostgun.com> wrote:

Processing flow data from a router. The script caches user information
from a database then processes a 25-30MB flow file captured from a
router using flow-tools. This takes about 5 minutes on a 2.4GHz pentium
4 unloaded by any other process running at 98% utilisation
continuously. To put this into context each flow file is only 15
minutes worth of data and the current perl version does it in a bit
over 2 minutes. This perl version is showing it's lack of design in a
variety of ways including the nightmare of trying to add features it
was never designed to support. So this was a good opportunity to
rewrite it in ruby to make it more maintainable, etc.

While 5 minutes in within the time constraint that is on an unloaded
machine and the machine it's destined for has other processes sharing
the CPU.

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Ryan_Davis1 · 3 February 2005 20:13

I have an open bug for the postgres dbi handler. It causes about a 7x slowdown in performance for even basic queries. I now use the raw postgres library to do all my work and the speed is actually fairly good.

···

On Feb 2, 2005, at 6:18 PM, jm wrote:

Processing flow data from a router. The script caches user information from a database then processes a 25-30MB flow file captured from a router using flow-tools. This takes about 5 minutes on a 2.4GHz pentium 4 unloaded by any other process running at 98% utilisation continuously. To put this into context each flow file is only 15 minutes worth of data and the current perl version does it in a bit over 2 minutes. This perl version is showing it's lack of design in a variety of ways including the nightmare of trying to add features it was never designed to support. So this was a good opportunity to rewrite it in ruby to make it more maintainable, etc.

While 5 minutes in within the time constraint that is on an unloaded machine and the machine it's destined for has other processes sharing the CPU.

--
ryand-ruby@zenspider.com - Seattle.rb - http://www.zenspider.com/seattle.rb
http://blog.zenspider.com/ - http://rubyforge.org/projects/ruby2c

George5 · 3 February 2005 16:08

Austin Ziegler wrote:

···

On Thu, 3 Feb 2005 11:18:18 +0900, jm <jeffm@ghostgun.com> wrote:

Processing flow data from a router. The script caches user information
from a database then processes a 25-30MB flow file captured from a
router using flow-tools. This takes about 5 minutes on a 2.4GHz pentium
4 unloaded by any other process running at 98% utilisation
continuously. To put this into context each flow file is only 15
minutes worth of data and the current perl version does it in a bit
over 2 minutes. This perl version is showing it's lack of design in a
variety of ways including the nightmare of trying to add features it
was never designed to support. So this was a good opportunity to
rewrite it in ruby to make it more maintainable, etc.

While 5 minutes in within the time constraint that is on an unloaded
machine and the machine it's destined for has other processes sharing
the CPU.

Could your script be doing things that could be improved in performance?

Well, I think Ruby/DBI is not the fastest. It parses (and splits) each SQL statement, then joins it back into a string, even if you don't use '?' parameter markers. This should be delayed and omitted if no parameters were given. Not sure whether this is the reason for the slowness.

Regards,

Michael

Austin_Ziegler5 · 3 February 2005 17:47

Austin Ziegler wrote:
>
>>Processing flow data from a router. The script caches user information
>>from a database then processes a 25-30MB flow file captured from a
>>router using flow-tools. This takes about 5 minutes on a 2.4GHz pentium
>>4 unloaded by any other process running at 98% utilisation
>>continuously. To put this into context each flow file is only 15
>>minutes worth of data and the current perl version does it in a bit
>>over 2 minutes. This perl version is showing it's lack of design in a
>>variety of ways including the nightmare of trying to add features it
>>was never designed to support. So this was a good opportunity to
>>rewrite it in ruby to make it more maintainable, etc.
>>
>>While 5 minutes in within the time constraint that is on an unloaded
>>machine and the machine it's destined for has other processes sharing
>>the CPU.
>
>
> Could your script be doing things that could be improved in performance?

Well, I think Ruby/DBI is not the fastest. It parses (and splits) each
SQL statement, then joins it back into a string, even if you don't use
'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the slowness.

Regards,

Michael

···

On Fri, 4 Feb 2005 01:08:09 +0900, Michael Neumann <mneumann@ntecs.de> wrote:

> On Thu, 3 Feb 2005 11:18:18 +0900, jm <jeffm@ghostgun.com > wrote:

--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca

Charles_Mills1 · 3 February 2005 18:45

Michael Neumann wrote:
(...)

Well, I think Ruby/DBI is not the fastest. It parses (and splits)

each

SQL statement, then joins it back into a string, even if you don't

use

'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the

slowness.

This may be a dumb question, but why does ruby/dbi do that? For
databases that don't support precompiled statements and binding
parameters?

-Charlie

George5 · 3 February 2005 19:11

Charles Mills wrote:

Michael Neumann wrote:
(...)

Well, I think Ruby/DBI is not the fastest. It parses (and splits)

each

SQL statement, then joins it back into a string, even if you don't

use

'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the

slowness.

This may be a dumb question, but why does ruby/dbi do that? For
databases that don't support precompiled statements and binding
parameters?

Yeah, good question. It tries to abstract over the database. Well, sometimes it's nicer to write:

dbh.execute("INSERT INTO tab values (?, ?, ?)", a, b, c)

instead of:

dbh.execute("INSERT INTO tab values (#{ quote(a) }, #{ quote(b) }, #{ quote(c) })")

Regards,

Michael

Jim_Weirich1 · 4 February 2005 14:50

Charles Mills said:

Michael Neumann wrote:
(...)

Well, I think Ruby/DBI is not the fastest. It parses (and splits)

each

SQL statement, then joins it back into a string, even if you don't

use

'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the

slowness.

This may be a dumb question, but why does ruby/dbi do that? For
databases that don't support precompiled statements and binding
parameters?

Its been a _long_ time since I have looked at DBI/DBD code, but if I
recall correctly, although the bind operation is provided by the DBI
library, the actually binding of values is performed in the database
specific DBD portion of the library.

If the database handles prepared statements and binding of "?" parameters,
then the DBD does not need to parse the SQL statements.

However, if the database does not offer that service, then the DBD can use
bind (in the BasicBind module) to emulate it.

Like I mentioned, its been a long time. Things may have changed since I
last looked at it (or my memory is faulty).

···

--
-- Jim Weirich

Michael_Walter1 · 4 February 2005 02:15

As Charles pointed out, the string-fumbling should merely be a fallback path.

Michael

···

On Fri, 4 Feb 2005 04:11:02 +0900, Michael Neumann <mneumann@ntecs.de> wrote:

Charles Mills wrote:
> Michael Neumann wrote:
> (...)
>
>>Well, I think Ruby/DBI is not the fastest. It parses (and splits)
>
> each
>
>>SQL statement, then joins it back into a string, even if you don't
>
> use
>
>>'?' parameter markers. This should be delayed and omitted if no
>>parameters were given. Not sure whether this is the reason for the
>
> slowness.
>
>
> This may be a dumb question, but why does ruby/dbi do that? For
> databases that don't support precompiled statements and binding
> parameters?

Yeah, good question. It tries to abstract over the database. Well,
sometimes it's nicer to write:

   dbh.execute("INSERT INTO tab values (?, ?, ?)", a, b, c)

instead of:

   dbh.execute("INSERT INTO tab values (#{ quote(a) }, #{ quote(b) }, #{
quote(c) })")

Regards,

   Michael

George5 · 13 February 2005 19:01

Jim Weirich wrote:

Charles Mills said:

Michael Neumann wrote:
(...)

Well, I think Ruby/DBI is not the fastest. It parses (and splits)

each

SQL statement, then joins it back into a string, even if you don't

use

'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the

slowness.

This may be a dumb question, but why does ruby/dbi do that? For
databases that don't support precompiled statements and binding
parameters?

Its been a _long_ time since I have looked at DBI/DBD code, but if I
recall correctly, although the bind operation is provided by the DBI
library, the actually binding of values is performed in the database
specific DBD portion of the library.

If the database handles prepared statements and binding of "?" parameters,
then the DBD does not need to parse the SQL statements.

However, if the database does not offer that service, then the DBD can use
bind (in the BasicBind module) to emulate it.

Like I mentioned, its been a long time. Things may have changed since I
last looked at it (or my memory is faulty).

No, you're correct. It's still working in the way you decribe.

Regards,

Michael

jm11 · 4 February 2005 03:06

The script only runs dbi to extract data user details from an existing database for caching the queries are of the form "select * from table". And this stage only takes about 5 seconds to run of 5-7 minutes. By far the majority of the time is spent on the actual processing.

The file reads as shown below. I've changed some of the variable that contained usernames, passwords, database, and table names. The only libraries that are from out side are dbi and socket. The main loop down the bottom loops over each filename supplied on the command line - looping over each flow record and tying that to a user bases on certain rules, then it prints it out in a radius detail file format.

Feel free to tear it apart. I intend making more of this publicly available in some form when it's finished.

#!/usr/bin/ruby

···

#

$LOAD_PATH << File.join(File.dirname(__FILE__),"lib")

OUTFILE = './details'

require 'Vflow'
require 'socket'
require 'dbi'
require 'velinfo'
require 'zones'
require 'radacct'
require 'traffic_record'
require 'iphash'

# both velinfo and zones use the same database.
dbh = if `hostname`.chomp! == 'testmachine'
         DBI.connect('dbi:Mysql:db','root','')
       else
         DBI.connect('dbi:Mysql:db:127.0.0.1','user_test','testpasswd')
       end

# build database of user information
puts "vinfo"
$vinfo = Velinfo.new(dbh)
$vinfo.loadcache

# build database of zone information
puts "zones"
$zones = Zones.new(dbh)
$zones.loadcache

# build database of radonline
puts "radacct"
$ra = Radacct.new(dbh)
$ra.online_table = 'db2.online_table' unless `hostname`.chomp! == 'testmachine'
$ra.loadcache

# make a trie to hold records
$traf_record = Hash.new

def find_user_info(only_when_on,user_addr)
   # 1. entry in radonline -> record_usage
   # 2. entry in velradiatorauth
   # a. address is primary address
   # i. if not only when online record_usage
   # b. address is network address
   # i. if not only when online record_usage
   # ii. use primary address to find radonline entry
   # iii. if found record_usage

   rauser = $ra.find_by_ip(user_addr)
   #puts "#{__LINE__} #{rauser and rauser.framed_ip_address.class}"
   return rauser unless rauser.nil?

   if (viuser = $vinfo.find_by_ip(user_addr))
     #puts "#{__LINE__} #{viuser.framed_ip_address.class}"
     return viuser unless only_when_on

     if viuser.replyattr.has_key?(:Framed_IP_Address) and !only_when_on
       if viuser.replyattr[:Framed_IP_Address] == ip
         return viuser
       else
         return nil
       end
     end

     if viuser.replyattr.has_key?(:Subnet)
       subnet = viuser.replyattr[:Subnet]
       subnet_gateway = viuser.replyattr[:Subnet_Gateway]
       if subnet.include?(user_addr)
         return viuser unless only_when_on
         return viuser if $ra.find_by_ip(subnet_gateway)
       end
     end
   end
   return nil
end

def record_usage(vrec)
   srcaddr = IPSocket.getaddress(vrec.srcaddr)
   dstaddr = IPSocket.getaddress(vrec.dstaddr)
   zdstinfo = $zones.find_info_by_ip(dstaddr)
   zsrcinfo = $zones.find_info_by_ip(srcaddr)
   # forward traffic
   if zsrcinfo and zsrcinfo.recordable
     #puts "#{__LINE__}"
     if (userinfo = find_user_info(zsrcinfo.only_when_on,srcaddr))
       #puts "#{__LINE__} #{userinfo.framed_ip_address.class}"
       zbilling = $zones.find_billing_forward(vrec)
       ipkey = userinfo.framed_ip_address or userinfo.replyattr[:Subnet]
       traf_rec = $traf_record[ipkey.hton]
       traf_rec = TrafficRecord.new(userinfo,ipkey) if traf_rec.nil?
       traf_rec.timestamp = Time.at(vrec.end_time)
       traf_rec.add_zone_traffic($zones.find_zone_by_ip(dstaddr),vrec.doctets,0)
       traf_rec.add_to_total(vrec.doctets,0)
       traf_rec.add_to_acct(vrec.doctets,0) if zbilling.billable
       $traf_record[ipkey.hton] = traf_rec
     end
   end
   # reverse traffic
   if zdstinfo and zdstinfo.recordable
     #puts "#{__LINE__}"
     if (userinfo = find_user_info(zdstinfo.only_when_on,dstaddr))
       #puts "#{__LINE__} #{userinfo.framed_ip_address.class}"
       zbilling = $zones.find_billing_reverse(vrec)
       ipkey = userinfo.framed_ip_address or userinfo.replyattr[:Subnet]
       traf_rec = $traf_record[ipkey.hton]
       traf_rec = TrafficRecord.new(userinfo,ipkey) if traf_rec.nil?
       traf_rec.timestamp = Time.at(vrec.end_time)
       traf_rec.add_zone_traffic($zones.find_zone_by_ip(srcaddr),0,vrec.doctets)
       traf_rec.add_to_total(0,vrec.doctets)
       traf_rec.add_to_acct(0,vrec.doctets) if zbilling.billable
       $traf_record[ipkey.hton] = traf_rec
     end
   end
end

# start reading netflow file
puts "vflow"
vf = Vflow.new

ARGV.each do |vfile|
   vf.open(vfile)
   count = 0
   vf.each do |vfent|
     count+=1
     #break if count > 50
     #puts "#{vfent.srcaddr} -> #{vfent.dstaddr}"
     record_usage(vfent)
   end
   vf.close
   puts "file #{vfile} contained #{count} records"
   print_usage()
end

Jeff.

On 04/02/2005, at 1:15 PM, Michael Walter wrote:

As Charles pointed out, the string-fumbling should merely be a fallback path.

Michael

On Fri, 4 Feb 2005 04:11:02 +0900, Michael Neumann <mneumann@ntecs.de> > wrote:

Charles Mills wrote:

Michael Neumann wrote:
(...)

Well, I think Ruby/DBI is not the fastest. It parses (and splits)

each

SQL statement, then joins it back into a string, even if you don't

use

'?' parameter markers. This should be delayed and omitted if no
parameters were given. Not sure whether this is the reason for the

slowness.

This may be a dumb question, but why does ruby/dbi do that? For
databases that don't support precompiled statements and binding
parameters?

Yeah, good question. It tries to abstract over the database. Well,
sometimes it's nicer to write:

   dbh.execute("INSERT INTO tab values (?, ?, ?)", a, b, c)

instead of:

   dbh.execute("INSERT INTO tab values (#{ quote(a) }, #{ quote(b) }, #{
quote(c) })")

Regards,

   Michael

Topic		Replies	Views
Dbi BUG? ruby-talk	2	87	27 February 2003
Ruby.dbi ruby-talk	1	76	19 November 2002
DBI, oracle, and prepare ruby-talk	2	74	14 February 2003
Initial Problem ruby-talk	0	66	21 January 2004
Strange dbi ruby-talk	0	85	14 August 2006

Yarv and dbi

Related topics