Improving hexadecimal escaping performance

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

  def self::hex_unescape(str)
    str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
  end

  def self::hex_escape(str)
    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
  end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Thanks a lot.

···

--
Iñaki Baz Castillo

Iñaki Baz Castillo wrote:

I don't like
too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

pickaxe2, p. 23:

···

------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------

Is there other way more ellegant?

def hex_escape(str)
  str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
    "%%%02X" % match[0]
  end
end

s = "?<>é"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9

--
Posted via http://www.ruby-forum.com/\.

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Cheers

robert

···

2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
   str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

--
remember.guy do |as, often| as.you_can - without end

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.

···

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:

2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
   str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

--
Iñaki Baz Castillo
<ibc@aliax.net>

Well, you can at least do this in 1.8

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert

···

2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:

2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:

Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

def self::hex_unescape(str)
   str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
end

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

--
remember.guy do |as, often| as.you_can - without end

* Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg, simon .... l

Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "ñ", "€"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.

···

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:

Well, you can at least do this in 1.8

def self::hex_escape(str)
  str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
  str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

--
Iñaki Baz Castillo
<ibc@aliax.net>

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

···

2009/2/23 Simon Krahnke <overlord@gmx.li>:

* Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

For what exactly is 40 microseconds too slow?

--
Iñaki Baz Castillo
<ibc@aliax.net>

15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

robert

···

2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> Well, you can at least do this in 1.8
>
> def self::hex_escape(str)
> str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
> end
>
> And this in 1.9
>
> def self::hex_escape(str)
> str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
> end

Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "ñ", "€"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

--
remember.guy do |as, often| as.you_can - without end

* Iñaki Baz Castillo <ibc@aliax.net> (17:14) schrieb:

···

2009/2/23 Simon Krahnke <overlord@gmx.li>:

* Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

For what exactly is 40 microseconds too slow?

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?

How would you implement these at the core level?

mfg, simon .... l

Clear now, thanks :slight_smile:

···

2009/2/23 Robert Klemme <shortcutter@googlemail.com>:

15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

--
Iñaki Baz Castillo
<ibc@aliax.net>