Speed Kata: pure-Ruby powmod

Hi,

I’m finally getting around collecting some of the crypto-related Ruby code
I have lying around into one package. While doing so I’ve given some
thought to the fact that Ruby does not have a native power modulo operator
(which for example Python has). Its a bit of a shame since its heavily
used in crypto algs which has some importance in modern-day computer use.
Maybe we should add it to bignum.c?

Anyway I currently use the simple, recursive and pretty fast

Calculate ((b**p) % m) assuming that b and m are large integers.

def Math.power_modulo(b, p, m)
if p == 1
b % m
elsif (p & 0x1) == 0 # p.even?
t = power_modulo(b, p >> 1, m)
(t * t) % m
else
(b * power_modulo(b, p-1, m)) % m
end
end

but since this is so speed-critical it can (in this abnormal case) be
wise to spend time optimizing it. Since many heads are wiser than one I
thought I’d pose this to you as a challenge:

Given that p, b and m are integers, p is typically less than 218 while
b and m are around 1000 or more bits what is the fastest pure-Ruby
implementation of powmod, ie. calculating ((b
p) % m)?

Maybe its not really a “kata” since the (best) code will not be thrown
away ;).

If you want a test case here is one:

class TestPowMod < Test::Unit::TestCase
def test_01_large_powmod
expected = 24656295158149552129522791057189025935153769756028704310461488183165007781190622
66158248517786824586416789690271181089924166576876358272852153508275457030547406
50790047459930747867644527843993869970230088426805592858713543180070230688245872
57820097544818526648507725023614743971311566767966295145235315729887013427381259
85437819072366890714347945762217274178746870467684532926146890122994632184298568
39868247687863706332185080258360241983212661935661577867566186261592507929101422
20369343724700074136993030921203032483179240961705092122797953462669310767801367
5181607550763200080096436693225368019004915688931329
assert_equal(expected, Math.power_modulo(1+21000, 65537, 22048))
end
end

Oh, and don’t run the testcase above with

Math.power_modulo(b, p, m)
((b**p) % m)
end

unless you have lots of time to wait… :wink:

Regards,

Robert Feldt

What about profiling it?
The ol' 20%-of-the-code-does-80%-of-the-work thesis.

Putting it into bignum.c is arguably the best way. I don't believe
assembler would buy anything because the main time spent would be in the
bignum.c library.

Ted

Wouldn’t it be better to define some classes for calculating modulo some
integer?

like

my_ring = Modulo::new(5)
a = my_ring.convert(2)
b = my_ring.convert(3)
c = Modulo::new(4).convert(2)

puts ab #yields “1 mod 5”
puts a+b #yields “0 mod 5”
puts my_ring.unit / a #yields “3 mod 5”
puts my_ring.unit / b #yields “2 mod 5”
puts Modulo::new(4).unit / c #throws Modulo::InverseDoesNotExist
puts a
c #throws "Modulo::ElementsFromDifferentRings.

The interface can probably be improved… But this seems more like the ruby
way for me.
Additionally, this enabled the Modulo instances to speedup calculation by
caching results.

Finally, there could be class methods in Modulo, like
power_module(a,b,n)
with the semantik you described.

What do you think?

greetings, Florian Pflug

···

On Thu, May 22, 2003 at 08:09:33AM +0900, Robert Feldt wrote:

Anyway I currently use the simple, recursive and pretty fast

Calculate ((b**p) % m) assuming that b and m are large integers.

def Math.power_modulo(b, p, m)
if p == 1
b % m
elsif (p & 0x1) == 0 # p.even?
t = power_modulo(b, p >> 1, m)
(t * t) % m
else
(b * power_modulo(b, p-1, m)) % m
end
end

Is anyone working on a ruby interface to PARI/GP:
http://www.math.u-psud.fr/~belabas/pari/ It’s supposed to be quite
fast.

Here are some notes on using it for crypto type routines:
http://www.math.iastate.edu/cbergman/crypto/pari/parihelp.html#powermod

? powermod(x,k,m)=lift(Mod(x,m)^k)
? powermod(1+2^1000, 65537, 2^2048)

is pretty fast. I haven’t benchmarked it against the ruby version
though.

thanks,
-joe

I’m familiar with that yes, but its a good technique when you don’t know
where time is spent. This is not the case in public-key crypto algs where
99% of the time is spent in converting string’s to bignum’s and
calculating power modulo’s on them :wink:

And yes I know a couple of ways of making this powmod faster already.
However, experience has shown that when people ask the community on this
list to optimize something people come up with new and creative
ideas/views that speed things up more than anyone would have thought. If
I’d give away my ideas that might “prime” your minds so you don’t come up
with the “other”/better ideas.

I’ll grant you that maybe this is not a sufficiently rich problem for
people’s creative juices to start flowing but who knows.

Regards,

Robert

···

On Thu, 22 May 2003, Ted Rolle wrote:

What about profiling it?
The ol’ 20%-of-the-code-does-80%-of-the-work thesis.

Ted Rolle wrote:

Putting it into bignum.c is arguably the best way.

… and then implement the powmod operation using one of the
established (advanced!) methods, like the Chinese remainder
theorem, and then recode the whole thing in assembler (which
does buy a lot of speed).

Or you could be sensible, and just import the appropriate code
from OpenSSL/SSLeay, where Eric Young has already done all
this, spending weeks of effort and beating the best commercial
implementations. IIRC his licensing terms merely require
acknowledgement.

Clifford.

Thanks for your proposal. I think its more OO/ruby, in some sense, but I’m
not sure about the speedups since, for my application, p and m is
typically the same while b changes for each application. Thus its not
clear that caching has any large benefits.

On a second note maybe it can have an effect since the “pattern” of
recursive calls for the version above only depends on p and thus could be
precomputed in some way. Hmm, I’ll investigate, thx.

Regards,

Robert

···

On Thu, 22 May 2003, Florian G. Pflug wrote:

On Thu, May 22, 2003 at 08:09:33AM +0900, Robert Feldt wrote:

Anyway I currently use the simple, recursive and pretty fast

Calculate ((b**p) % m) assuming that b and m are large integers.

def Math.power_modulo(b, p, m)
if p == 1
b % m
elsif (p & 0x1) == 0 # p.even?
t = power_modulo(b, p >> 1, m)
(t * t) % m
else
(b * power_modulo(b, p-1, m)) % m
end
end

Wouldn’t it be better to define some classes for calculating modulo some
integer?

like

my_ring = Modulo::new(5)
a = my_ring.convert(2)
b = my_ring.convert(3)
c = Modulo::new(4).convert(2)

puts ab #yields “1 mod 5”
puts a+b #yields “0 mod 5”
puts my_ring.unit / a #yields “3 mod 5”
puts my_ring.unit / b #yields “2 mod 5”
puts Modulo::new(4).unit / c #throws Modulo::InverseDoesNotExist
puts a
c #throws "Modulo::ElementsFromDifferentRings.

The interface can probably be improved… But this seems more like the ruby
way for me.
Additionally, this enabled the Modulo instances to speedup calculation by
caching results.

Finally, there could be class methods in Modulo, like
power_module(a,b,n)
with the semantik you described.

What do you think?

I think PARI is based on the GNU MP library which is really fast for
multi-precision integer arithmetics. Someone should make an extension for
GNU MP some day… :wink:

BTW my current fastest pure-ruby implementation is a “compiling” version
of the iterative binexp alg that unrolls the loop and check. I tried some
more advanced algorithms like Bartlett’s mod reduction method (avoids
division in favor of shifts and multiplications) but it isn’t faster. I’d
be interested in any speedups people can find on this one:

class CompilingBinExpPowMod
def initialize(p, m)
@p, @m = p, m
create_calc_method
end

private

def create_calc_method
  if @p == 0
    body = "#{(@m == 1) ? 0 : 1}"
  else
    body =<<-EOC
      t = b % @m
      #{unrolled_binary_exps}
      t
    EOC
  end
  self.instance_eval "def calc(b)\n#{body}\nend\n"
end

def unrolled_binary_exps
  s, msb_pos  = "", bits_in_num(@p) - 1
  mask = 1 << (msb_pos-1)
  while mask > 0
    s << "t = (t * t) % @m\n"
    s << "t = (b * t) % @m\n" if (@p & mask) > 0
    mask >>= 1
  end
  s
end

end

using this one together with the chinese remainder theorem makes RSA
decryption doable in pure-Ruby. Since RSA is typically used to exchange
symmetric keys this is useable in practice if you have a modern-day pc.

Anyway thanks for all input,

Robert Feldt

···

On Fri, 23 May 2003, Joseph McDonald wrote:

Is anyone working on a ruby interface to PARI/GP:
http://www.math.u-psud.fr/~belabas/pari/ It’s supposed to be quite
fast.

Here are some notes on using it for crypto type routines:
http://www.math.iastate.edu/cbergman/crypto/pari/parihelp.html#powermod

? powermod(x,k,m)=lift(Mod(x,m)^k)
? powermod(1+2^1000, 65537, 2^2048)

is pretty fast. I haven’t benchmarked it against the ruby version
though.

Hi,

···

In message “Re: Speed Kata: pure-Ruby powmod” on 03/05/22, Clifford Heath cjh_nospam@managesoft.com writes:

… and then implement the powmod operation using one of the
established (advanced!) methods, like the Chinese remainder
theorem, and then recode the whole thing in assembler (which
does buy a lot of speed).

Or you could be sensible, and just import the appropriate code
from OpenSSL/SSLeay, where Eric Young has already done all
this, spending weeks of effort and beating the best commercial
implementations. IIRC his licensing terms merely require
acknowledgement.

Good news. Contribution is welcome as always. If someone is willing
to put it in bignum.c, I’m happy to merge it.

						matz.

Yes, thats probably the way. Do you think OpenSSL’s implementation is
faster/simpler/better than the one in libgcrypt (which in turn is based on
multi-precision part of Gnu Scientific Lib)?

Do you happen to know how similar OpenSSL’s and/or libgcrypt’s
multi-precision libs are to Ruby’s? (Matz, what code was bignum originally
based on?). It might simplify things if the representation of ruby’s
bignums internally is close to the implementation we chose to base it on.

Regards,

Robert

···

On Thu, 22 May 2003, Clifford Heath wrote:

Ted Rolle wrote:

Putting it into bignum.c is arguably the best way.

… and then implement the powmod operation using one of the
established (advanced!) methods, like the Chinese remainder
theorem, and then recode the whole thing in assembler (which
does buy a lot of speed).

Or you could be sensible, and just import the appropriate code
from OpenSSL/SSLeay, where Eric Young has already done all
this, spending weeks of effort and beating the best commercial
implementations. IIRC his licensing terms merely require
acknowledgement.

I tried with an iterative version:

class IterativeBinExpPowMod
def initialize(p, m)
@p, @m = p, m
@msb_pos = (Math.log(p)/Log2).floor
@mask = 1 << (@msb_pos-1)
end

def calc(b)
  return ((m == 1) ? 0 : 1) if p == 0
  mask = @mask
  t = b % m
  while mask > 0
t = (t * t) % m
t = ((b * t) % m) if (p & mask) > 0
mask >>= 1
  end
  t
end

end

and it seems to gain a couple of percent in speed vs. the recursive one I
used previously on relevant test data. This one is better though since the
recursive one can cause too deep stack nesting errors with large exponents.

Regards,

Robert

···

On Thu, 22 May 2003, Robert Feldt wrote:

Additionally, this enabled the Modulo instances to speedup calculation by
caching results.

On a second note maybe it can have an effect since the “pattern” of
recursive calls for the version above only depends on p and thus could be
precomputed in some way. Hmm, I’ll investigate, thx.

Hi,

···

In message “Re: Speed Kata: pure-Ruby powmod” on 03/05/22, Robert Feldt feldt@ce.chalmers.se writes:

Matz, what code was bignum originally based on?

I wrote it, with the influence from scm’s bignum routines.

						matz.