No clue

(J-Van) #1

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

      attributes = message.split(",")
      attributes.each do |attribute|
        key, value = attribute.scan(/(\w+): (.+)/)[0]
        result_hash[key.to_sym] = value.strip
      end
    
Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

(Tim Hunter) #2

Joe Van Dyk wrote:

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.
So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

      attributes = message.split(",")
      attributes.each do |attribute|
        key, value = attribute.scan(/(\w+): (.+)/)[0]
        result_hash[key.to_sym] = value.strip end
    Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

It doesn't look particularly hackish to me, but maybe my sensibilities aren't fine enough. The only thing I'd say is that if performance is important then we ought to ask the regular expression to strip whitespace around the key and value so we can avoid the #strip method.

Here's my version:

require 'pp'

hash = Hash.new
DATA.each do |line|
    attrs = line.split(/,/)
    attrs.each do |attr|
       m = /\s*(\w+)\s*:\s*(\w+)\s*/.match(attr)
       raise "#{attr.chomp} doesn't look like key:value" unless m
       hash[m[1].intern] = m[2]
    end
end

pp hash

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

The output is:
{:some_other_key=>"more_blah",
  :key555555555=>"value55555",
  :yet_other_key=>"yet_more_blah",
  :key1=>"value1",
  :key2=>"value2",
  :key3=>"value",
  :some_key=>"blah",
  :key4=>"value4"}

(Devin Mullins) #3

Joe Van Dyk wrote:

Here's my initial attempt, which works, but seems hackish:

     attributes = message.split(",")
     attributes.each do |attribute|
       key, value = attribute.scan(/(\w+): (.+)/)[0]
       result_hash[key.to_sym] = value.strip end

Slightly more readable:

result_hash = {}
attributes = message.split(",")
attributes.each do |attribute|
  key, value = *attribute.match(/(\w+): (.+)/).captures
  result_hash[key.to_sym] = value.strip end

Yet more readable:

result_hash = {}
attributes = message.split ","
attributes.each do |attribute|
  key, value = *attribute.split(": ",2)
  result_hash[key.to_sym] = value.strip end

Not sure:
attributes = message.split ","
result_hash = attributes.inject {} do |hash,attribute|
  key, value = *attribute.split(": ",2)
  hash[key.to_sym] = value.strip
  hash
end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

No clue.

Devin

(Ara.T.Howard) #4

you'll have a hard time getting much faster than strscan:

   harp:~ > cat a.rb
   require 'strscan'

   class HashString < ::Hash
     class SyntaxError < StandardError; end
     def initialize s, dup = false
       load_from s, dup
     end
     def load_from s, dup = false
       @ss = StringScanner::new s, dup
       loop do
         key, value = scan_key, scan_value
         self[key] = value
         break if eos?
       end
       @ss = nil
     end
     def scan_key
       @ss.scan(%r/[\n\s]*([^:\n]+)[\n\s]*(?=:)/o) or syntax_error
       key = @ss[1]
       @ss.scan(%r/[\n\s]*:[\n\s]*/o) or syntax_error
       key
     end
     def scan_value
       scan(%r/[\n\s]*([^,\n]+)[\n\s]*/o) or syntax_error
       value = @ss[1]
       scan(%r/[\n\s]*,?[\n\s]*/o)
       value
     end
     def eos?
       @ss.eos?
     end
     def scan pat
       @ss.scan pat
     end
     def syntax_error
       raise SyntaxError, @ss.peek(16) + '...'
     end
     def to_yaml
       {}.merge(self).to_yaml
     end
   end

   s = <<-txt
     some_key: blah,
   some_other_key: more_blah, yet_other_key:
           yet_more_blah
   txt

   hs = HashString::new s

   require 'yaml'
   y hs

   harp:~ > ruby a.rb

···

On Sat, 2 Jul 2005, Joe Van Dyk wrote:

I thought for all of five seconds for a good subject line for this
question, but failed. Sorry!

I have a string like:

"some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah", :yet_other_key
=> "yet_more_blah" }

And I don't really want to have to know what the possible keys are in advance.

So, the message format looks like:
<key>: <value>, <key>: <value>

How can I properly extract it out?

Here's my initial attempt, which works, but seems hackish:

     attributes = message.split(",")
     attributes.each do |attribute|
       key, value = attribute.scan(/(\w+): (.+)/)[0]
       result_hash[key.to_sym] = value.strip
     end

Also, this will get ran potentially thousands of times per second, so
executation speed is of some concern.

   ---
   some_key: blah
   yet_other_key: yet_more_blah
   some_other_key: more_blah

strscan is pure c and extremely fast. it doesn't end up creating any new
strings like spliting or regex based solutions. it keeps a pointer into the
string and moves through it. it takes some getting used to be is really good
and part of the standard dist.

cheers.

-a
--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

(Daniel Brockman) #5

Joe Van Dyk <joevandyk@gmail.com> writes:

    attributes = message.split(",")
    attributes.each do |attribute|
      key, value = attribute.scan(/(\w+): (.+)/)[0]
      result_hash[key.to_sym] = value.strip
    end

How about this?

   message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
     result_hash[k.to_sym] = v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

···

--
Daniel Brockman <daniel@brockman.se>

(Daniel Brockman) #6

Daniel Brockman <daniel@brockman.se> writes:

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe
it is a bit faster than your original version.

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

···

--
Daniel Brockman <daniel@brockman.se>

(J-Van) #7

Joe Van Dyk <joevandyk@gmail.com> writes:

(original attempt.. was too slow)

> attributes = message.split(",")
> attributes.each do |attribute|
> key, value = attribute.scan(/(\w+): (.+)/)[0]
> result_hash[key.to_sym] = value.strip
> end

How about this?

   message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
     result_hash[k.to_sym] = v end

> Also, this will get ran potentially thousands of times per second,
> so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Bringing up an old thread.... I have the following code.

  # Converts an array like
  # [[0, "x_position: 20, y_position: 40, z_position: 30"],
  # [1, "x_position: 20, y_position: 40, z_position: 30"]
  # ]

···

On 7/1/05, Daniel Brockman <daniel@brockman.se> wrote:
  #
  # into a hash like
  # { 0 => { :x_position => "20", :y_position => "40", :z_position => "30" },
  # 1 => { :x_position => "20", :y_position => "40", :z_position => "30" }
  # }
  def self.convert_message_to_hash players_array
    raise "Can't do anything with empty message!" if original_message.nil?
    result_hash = {}
    original_message.each do |id, message|
      message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
        result_hash[id][k.to_sym] = v
      end
    end
    result_hash
  end
end

That code in my application leads to the following profiling:
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
  8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
  6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=
  6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!

(Ara.T.Howard) #8

sure it is. but with no error checking and it accepts invalid strings. it
will also fail for things like

   42.0 : value

since '.' is not a \w (tricky). anyhow i didn't know the standard scan was so
fast! a simple/similar version of the strscan method runs about the same for
small strings, but scales a bit better:

   jib:~ > ruby a.rb
   HashString @ 16.7303600311279
   HashStringSimple @ 21.1355850696564

   jib:~ > cat a.rb
   require 'strscan'

   class HashString < ::Hash
     def initialize s
       ss = StringScanner::new s, false
       loop do
         ss.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o) or break
         self[ss[1]] = ss[2]
       end
     end
   end

   class HashStringSimple < ::Hash
     def initialize s
       s.scan(%r/\s*([^:]*[^\s:])\s*:\s*([^,]*[^,\s])\s*,?\s*/o){|k,v| self[k] = v}
     end
   end

   def time label
     fork do
       a = Time::now.to_f
       yield
       b = Time::now.to_f
       t = b - a
       puts "#{ label } @ #{ t }"
     end
     Process::wait
   end

   n = 2 ** 20
   huge = ''

   n.times do |i|
     huge << "#{ rand } : #{ rand }"
     huge << ", " if i != n - 1
   end

   time('HashString'){ hs = HashString::new huge }

   time('HashStringSimple'){ hs = HashStringSimple::new huge }

cheers.

-a

···

On Sat, 2 Jul 2005, Daniel Brockman wrote:

Daniel Brockman <daniel@brockman.se> writes:

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe
it is a bit faster than your original version.

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

--

email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
My religion is very simple. My religion is kindness.
--Tenzin Gyatso

===============================================================================

(J-Van) #9

> Joe Van Dyk <joevandyk@gmail.com> writes:

(original attempt.. was too slow)

>
> > attributes = message.split(",")
> > attributes.each do |attribute|
> > key, value = attribute.scan(/(\w+): (.+)/)[0]
> > result_hash[key.to_sym] = value.strip
> > end
>
> How about this?
>
> message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
> result_hash[k.to_sym] = v end
>
> > Also, this will get ran potentially thousands of times per second,
> > so executation speed is of some concern.
>
> I don't know if the above is the best you can do, but I do believe it
> is a bit faster than your original version.

Bringing up an old thread.... I have the following code.

  # Converts an array like
  # [[0, "x_position: 20, y_position: 40, z_position: 30"],
  # [1, "x_position: 20, y_position: 40, z_position: 30"]
  # ]
  #
  # into a hash like
  # { 0 => { :x_position => "20", :y_position => "40", :z_position => "30" },
  # 1 => { :x_position => "20", :y_position => "40", :z_position => "30" }
  # }

Whoops!
'original_message' should be 'players_array' in the code.

···

On 8/11/05, Joe Van Dyk <joevandyk@gmail.com> wrote:

On 7/1/05, Daniel Brockman <daniel@brockman.se> wrote:

  def self.convert_message_to_hash players_array
    raise "Can't do anything with empty message!" if original_message.nil?
    result_hash = {}
    original_message.each do |id, message|
      message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
        result_hash[id][k.to_sym] = v
      end
    end
    result_hash
  end
end

That code in my application leads to the following profiling:
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
  8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
  6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=
  6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!

(W. James) #10

Joe Van Dyk wrote:

I have a string like:

"some_key: blah, some_other_key: more_blah,
yet_other_key: yet_more_blah"

I want to build up a hash like

{ :some_key => "blah", :some_other_key => "more_blah",
:yet_other_key => "yet_more_blah" }

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
  if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

···

----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}

(Simon Kröger) #11

This may not be exactly an improvement but at least ..hmm..
funny?

···

--------------------------------------------------------------

a = [[0, "x_position: 20, y_position: 40, z_position: 30"],
        [1, "x_position: 20, y_position: 40, z_position: 30"]]

h = eval('{' + ("~" + a.join("~")).
  gsub(/\s*([^:\~|]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
    gsub(/~(\d+)~/, '},\1=>{')[2..-2]+ '}}')

puts h[1][:y_position]

--------------------------------------------------------------

cheers

Simon

Joe Van Dyk wrote:

On 7/1/05, Daniel Brockman <daniel@brockman.se> wrote:

Joe Van Dyk <joevandyk@gmail.com> writes:

(original attempt.. was too slow)

   attributes = message.split(",")
   attributes.each do |attribute|
     key, value = attribute.scan(/(\w+): (.+)/)[0]
     result_hash[key.to_sym] = value.strip
   end

How about this?

  message.scan /(\w+)\s*:\s*([^, ]*)/ do |k, v|
    result_hash[k.to_sym] = v end

Also, this will get ran potentially thousands of times per second,
so executation speed is of some concern.

I don't know if the above is the best you can do, but I do believe it
is a bit faster than your original version.

Bringing up an old thread.... I have the following code.

  # Converts an array like
  # [[0, "x_position: 20, y_position: 40, z_position: 30"],
  # [1, "x_position: 20, y_position: 40, z_position: 30"]
  # ]
  # # into a hash like
  # { 0 => { :x_position => "20", :y_position => "40", :z_position => "30" },
  # 1 => { :x_position => "20", :y_position => "40", :z_position => "30" }
  # }
  def self.convert_message_to_hash players_array
    raise "Can't do anything with empty message!" if original_message.nil?
    result_hash = {}
    original_message.each do |id, message|
      message.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
        result_hash[id][k.to_sym] = v end
    end
    result_hash
  end
end

That code in my application leads to the following profiling:
  % cumulative self self total
time seconds seconds calls ms/call ms/call name
37.84 261.23 261.23 5569 46.91 69.61 String#scan
  8.12 317.28 56.05 201791 0.28 0.28 Hash#[]
  6.42 409.72 44.31 150783 0.29 0.29 Hash#[]=
  6.36 453.61 43.89 144782 0.30 0.30 String#to_sym

I believe this is probably the critical part of my code.

Ideas on how to improve this would be appreciated!

(Daniel Brockman) #12

"Ara.T.Howard" <Ara.T.Howard@noaa.gov> writes:

According to my tests, it is also more than twice as fast as that
enourmous strscan implementation. (Can anyone confirm?)

sure it is. but with no error checking and it accepts
invalid strings.

Perhaps there are no invalid strings?

it will also fail for things like

   42.0 : value

I didn't see that in the original post. The key should be a symbol,
which I took to mean it had to be a valid Ruby identifier.

since '.' is not a \w (tricky).

But the characters permitted in Ruby identifiers are. (Though I
forgot `!' and `?'.)

anyhow i didn't know the standard scan was so fast!

Regular expressions are pretty fast, because you compile them.
I think of them as OpenGL display lists. :slight_smile:

[...] %r/\s*( [...]

I've never seen `%r/.../' used before --- interesting.

···

On Sat, 2 Jul 2005, Daniel Brockman wrote:

--
Daniel Brockman <daniel@brockman.se>

    So really, we all have to ask ourselves:
    Am I waiting for RMS to do this? --TTN.

(J-Van) #13

Why would this approach be faster?

···

On 8/12/05, William James <w_a_x_man@yahoo.com> wrote:

Joe Van Dyk wrote:
>I have a string like:
>
>"some_key: blah, some_other_key: more_blah,
> yet_other_key: yet_more_blah"
>
>I want to build up a hash like
>
>{ :some_key => "blah", :some_other_key => "more_blah",
> :yet_other_key => "yet_more_blah" }

h={}
DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
  if a then h[a.to_sym]=b; a=false else a=b end }

p h
__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1
key2: value2
key3 : value 3 , key4 :value4 , key555555555:value55555

----
Output:

{:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
:some_other_key=>"more_blah", :key555555555=>"value55555",
:yet_other_key=>"yet_more_blah", :key1=>"value1",
:key2=>"value2"}

(Simon Kröger) #14

Hi Joe,

as a more serious contribution to your problem:
this is 10 times faster:

data.each{ |j, line|
   k, v = -2, 0
   while (v = line.index(58, k))
     h5[j][line[(k+2)...v].intern] =
       line[(v+2)...(k = line.index(44, v) || line.length)]
   end
}

i attached the whole test script, the output is:

                           user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true

cheers

Simon

hack.rb (1.17 KB)

(W. James) #15

Joe Van Dyk wrote:

> Joe Van Dyk wrote:
> >I have a string like:
> >
> >"some_key: blah, some_other_key: more_blah,
> > yet_other_key: yet_more_blah"
> >
> >I want to build up a hash like
> >
> >{ :some_key => "blah", :some_other_key => "more_blah",
> > :yet_other_key => "yet_more_blah" }
>
> h={}
> DATA.read.split(/\s*[:,\n]\s*/).inject(false){|a,b|
> if a then h[a.to_sym]=b; a=false else a=b end }
>
> p h
> __END__
> some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
> key1:value1
> key2: value2
> key3 : value 3 , key4 :value4 , key555555555:value55555
>
>
> ----
> Output:
>
> {:key3=>"value 3", :some_key=>"blah", :key4=>"value4",
> :some_other_key=>"more_blah", :key555555555=>"value55555",
> :yet_other_key=>"yet_more_blah", :key1=>"value1",
> :key2=>"value2"}

Why would this approach be faster?

data = []; DATA.each{|x| data << x.chomp}
iter = 100_000

h={}
start = Time.now

iter.times {
data.each{|line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
  if a then h[a.to_sym]=b; a=false else a=b end }}
}
t1 = Time.now - start

result_hash = {}
start = Time.now

iter.times {
data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
  result_hash[k.to_sym] = v
end
} }
t2 = Time.now - start

p result_hash == h
p t1,t2

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85

Output:
true
22.859
28.094

···

On 8/12/05, William James <w_a_x_man@yahoo.com> wrote:

(J-Van) #16

Thank you! I must say that the logic in the 'index' one confuses me though.

···

On 8/13/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

Hi Joe,

as a more serious contribution to your problem:
this is 10 times faster:

data.each{ |j, line|
   k, v = -2, 0
   while (v = line.index(58, k))
     h5[j][line[(k+2)...v].intern] =
       line[(v+2)...(k = line.index(44, v) || line.length)]
   end
}

i attached the whole test script, the output is:

                           user system total real
inject 4.640000 0.046000 4.686000 ( 4.687000)
scan 5.204000 0.063000 5.267000 ( 5.875000)
eval 6.078000 0.016000 6.094000 ( 6.094000)
tmp 4.375000 0.000000 4.375000 ( 4.391000)
index 0.312000 0.000000 0.312000 ( 0.312000)
true

cheers

Simon

require 'benchmark'

a = []
100.times {|i| a << [i, "x_position: #{200+i}, y_position: #{400+i}, z_position: #{300+i}"]}

data = a
iter = 1000
h1 = h2 = h3 = h4 = h5 = {}

Benchmark.bm 20 do |bm|
        bm.report("inject") do
                iter.times {
                        data.each{|i, line| h1[i]={}; line.split(/\s*[:,]\s*/).inject(false){|a,b|
                          if a then h1[i][a.to_sym]=b; a=false else a=b end }}
                }
        end

        bm.report "scan" do
                iter.times {
                data.each{|i, line|h2[i]={}; line.scan(/(\w+)\s*:\s*([^, ]*)/) do |k, v|
                  h2[i][k.to_sym] = v
                end
                } }
        end

        bm.report "eval" do
                iter.times {
                        h3 = eval('{' << ('~' << data.join('~')).
                                gsub!(/\s*([^:~]+):\s*([^,~]+),?/, ':\1=>\'\2\',').
                                        gsub!(/~(\d+)~/, '},\1=>{')[2..-2] << '}}')
                }
        end

        bm.report "tmp" do
                iter.times {
                        data.each{ |j, line| h4[j]={};tmp = line.split(/\s*[:,]\s*/)
                                (0...tmp.size).step(2){ |i| h4[j][tmp[i].to_sym]=tmp[i+1] }
                        }}
        end

        bm.report "index" do
                iter.times {
                        data.each{ |j, line|
                                k, v = -2, 0
                                while (v = line.index(58, k))
                                        h5[j][line[(k+2)...v].intern] = line[(v+2)...(k = line.index(44, v) || line.length)]
                                end
                        }
                }
        end

end

p((h1 == h2) && (h1 == h3) && (h1 == h4) && (h1 == h5))

(W. James) #17

Faster yet:

iter = 20_000

data = DATA.inject([]){|a,x| a << x.chomp}
times = []

h1={}
times << Time.now

iter.times {
  data.each{ |line| tmp = line.split(/\s*[:,]\s*/)
    (0...tmp.size).step(2){ |i| h1[tmp[i].to_sym]=tmp[i+1] }
  }
}

h2={}
times << Time.now

iter.times {
  data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
    if a then h2[a.to_sym]=b; a=false else a=b end }
  }
}

result_hash = {}
times << Time.now

iter.times {
  data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
    result_hash[k.to_sym] = v }
  }
}

times << Time.now

p result_hash == h2 && h2 == h1
(1...times.size).each{|i| p times[i]-times[i-1]}

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85

Output (on a slower computer):
true
6.82
7.27
8.413

(Simon Kröger) #18

data.each{ |j, line|
   k, v = -2, 0
   while (v = line.index(58, k))
     h5[j][line[(k+2)...v].intern] =
       line[(v+2)...(k = line.index(44, v) || line.length)]
   end
}

Ok, lets walk this trough:

j is just the key in the outer hash.

line looks like:
"x_position: 200, y_position: 400, z_position: 300"

k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line

58 is the ascii number of the char ':'
44 is the ascii number of the char ','

#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)

while there is another ':' in the string
   add key from last ',' to ':' => value from ':' to next ','
end

the +2 is there to skip the (',' or ':') and the space.

the initial k = -2 is there because no ',' is there to skip
at the beginning.

One is loosing readability of code if optimizing for speed
has top priority - even in ruby.

data.each{|j, line|
   line.split(',').each{|kv|
     k, v = kv.split(':')
     h6[j][k.strip.intern] = v.strip
   }
}

is much nicer, but look at the numbers:

                           user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true

cheers

Simon

(J-Van) #19

Faster yet:

iter = 20_000

data = DATA.inject([]){|a,x| a << x.chomp}

What is DATA? Does it have anything to do with __END__ at the bottom?

Joe

···

On 8/12/05, William James <w_a_x_man@yahoo.com> wrote:

times = []

h1={}
times << Time.now

iter.times {
  data.each{ |line| tmp = line.split(/\s*[:,]\s*/)
    (0...tmp.size).step(2){ |i| h1[tmp[i].to_sym]=tmp[i+1] }
  }
}

h2={}
times << Time.now

iter.times {
  data.each{ |line| line.split(/\s*[:,]\s*/).inject(false){|a,b|
    if a then h2[a.to_sym]=b; a=false else a=b end }
  }
}

result_hash = {}
times << Time.now

iter.times {
  data.each{|line| line.scan(/(\w+)\s*:\s*([^, ]*)/) { |k, v|
    result_hash[k.to_sym] = v }
  }
}

times << Time.now

p result_hash == h2 && h2 == h1
(1...times.size).each{|i| p times[i]-times[i-1]}

__END__
some_key: blah, some_other_key: more_blah, yet_other_key: yet_more_blah
key1:value1,key2: value2,key3 :value_3 ,
key4:value4,key555555555:value55555
x_position: 20, y_position: 40, z_position: 30,zz_position:85

Output (on a slower computer):
true
6.82
7.27
8.413

(J-Van) #20

Ah, ok. In my application, there's a bunch more than 3 possible keys
and they are of differing length. I am in control of the format of
the incoming strings though, and so could modify their format to make
them easier/faster to parse. Any ideas on what would be a more
efficient format for transporting the data?

(for reference, the original string format was "id: 3, x_position: 39,
y_position: 209, z_position: 39" and in my real application, there's
about twenty different attributes that are in the string.)

Perhaps it would be more efficient to not convert the string into a hash?

All I really need to be able to do is access/display a player's data
via some mechanism, and a player's data should be updated once a
second, and there's up to 400 players. The above was the best way I
could come up with transporting and accessing the data, but perhaps
there's a better way of doing it.

···

On 8/13/05, Simon Kröger <SimonKroeger@gmx.de> wrote:

data.each{ |j, line|
   k, v = -2, 0
   while (v = line.index(58, k))
     h5[j][line[(k+2)...v].intern] =
       line[(v+2)...(k = line.index(44, v) || line.length)]
   end
}

Ok, lets walk this trough:

j is just the key in the outer hash.

line looks like:
"x_position: 200, y_position: 400, z_position: 300"

k is the index of the key (like 'x_position') in the line
v is the index of the value (like '200') in the line

58 is the ascii number of the char ':'
44 is the ascii number of the char ','

#index returns the index of the char in the string or nil
if no such char exists (after the index given as second
parameter)

while there is another ':' in the string
   add key from last ',' to ':' => value from ':' to next ','
end

the +2 is there to skip the (',' or ':') and the space.

the initial k = -2 is there because no ',' is there to skip
at the beginning.

One is loosing readability of code if optimizing for speed
has top priority - even in ruby.

data.each{|j, line|
   line.split(',').each{|kv|
     k, v = kv.split(':')
     h6[j][k.strip.intern] = v.strip
   }
}

is much nicer, but look at the numbers:

                           user system total real
inject 4.672000 0.015000 4.687000 ( 5.281000)
scan 5.250000 0.063000 5.313000 ( 5.312000)
eval 6.140000 0.047000 6.187000 ( 6.219000)
tmp 4.407000 0.062000 4.469000 ( 4.469000)
index 0.375000 0.000000 0.375000 ( 0.375000)
split 10.625000 0.141000 10.766000 ( 10.781000)
true