Compare to strings

Clint_Pidlubny · 4 April 2006 22:26

Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = 'http://www.url.com'
url2 = 'http://www.url.com/page'

If part of url1 is in url2, like above, I'd like to declare it a
match. I'm sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I'm not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

Thanks
Clint

David_A_Black3 · 5 April 2006 00:35

Hi --

···

On Wed, 5 Apr 2006, Clint Pidlubny wrote:

Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = 'http://www.url.com'
url2 = 'URL.com - MediaOptions;

If part of url1 is in url2, like above, I'd like to declare it a
match. I'm sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I'm not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

It's not a complete answer, but in case it helps: String has an
include? method:

url2.include?(url1) => true

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! Ruby for Rails

Clint_Pidlubny · 5 April 2006 02:10

I can't think of why that wouldn't work. Thank you.

Clint

···

On 4/4/06, dblack@wobblini.net <dblack@wobblini.net> wrote:

Hi --

On Wed, 5 Apr 2006, Clint Pidlubny wrote:

> Hello,
>
> What is the best approach to searching a string for another string?
>
> For instance, I have:
>
> url1 = 'http://www.url.com'
> url2 = 'URL.com - MediaOptions;
>
> If part of url1 is in url2, like above, I'd like to declare it a
> match. I'm sure this happens using a regular expression, but my
> experience is limited with them.
>
> The other problem is that I'm not going to be looking for just one
> url1, but I have an entire database table full of those to compare to
> an entire database table of url2.
>
> Any thoughts on approaching this problem are appreciated.

It's not a complete answer, but in case it helps: String has an
include? method:

url2.include?(url1) => true

David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! Ruby for Rails

Zach_Dennis1 · 5 April 2006 06:46

dblack@wobblini.net wrote:

Hi --

Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = 'http://www.url.com'
url2 = 'URL.com - MediaOptions;

If part of url1 is in url2, like above, I'd like to declare it a
match. I'm sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I'm not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

It's not a complete answer, but in case it helps: String has an
include? method:

url2.include?(url1) => true

Using String#include? is much faster then regexp matching. Here are some
benchmarks. I didn't test this with Oniguruma though, but I su

-- START CODE --
require 'benchmark'

url = "URL.com - MediaOptions;
url2 = "URL.com - MediaOptions;

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
-- END CODE ---

Benchmark Windows ruby 1.8.4 (2005-12-24) [i386-mswin32]
C:\source\projects\ruby\strings>ruby temp.rb
      user system total real
  0.080000 0.000000 0.080000 ( 0.080000)
  1.722000 0.130000 1.852000 ( 1.873000)

Benchmark Linux ruby 1.8.4 (2005-12-24) [i686-linux]
zdennis@lima:~$ ruby-1.8.4 temp.rb
      user system total real
  0.100000 0.000000 0.100000 ( 0.119403)
  1.570000 0.040000 1.610000 ( 1.760446)

Benchmark Linux ruby 1.8.3 (2005-06-23) [i486-linux]
zdennis@lima:~$ ruby temp.rb
      user system total real
  0.160000 0.030000 0.190000 ( 0.209436)
  1.720000 0.080000 1.800000 ( 2.021754)

Benchmark Linux ruby 1.8.2 (2005-04-11) [i386-linux]
zdennis@jboss:~$ ruby temp.rb
      user system total real
  0.000000 0.000000 0.000000 ( 0.246239)
  0.000000 0.000000 0.000000 ( 1.401049)

Zach

···

On Wed, 5 Apr 2006, Clint Pidlubny wrote:

Clint_Pidlubny · 5 April 2006 23:23

Using String#include? is much faster then regexp matching. Here are some
benchmarks. I didn't test this with Oniguruma though, but I su

-- START CODE --
require 'benchmark'

url = "URL.com - Media Options;
url2 = "URL.com - Media Options;

Benchmark.bm{ |x|
        x.report{ 100000.times { url2.include?( url ) } }
        x.report{ 100000.times { url2 =~ /#{url}/ } }
}
-- END CODE ---

Benchmark Windows ruby 1.8.4 (2005-12-24) [i386-mswin32]
C:\source\projects\ruby\strings>ruby temp.rb
      user system total real
  0.080000 0.000000 0.080000 ( 0.080000)
  1.722000 0.130000 1.852000 ( 1.873000)

Benchmark Linux ruby 1.8.4 (2005-12-24) [i686-linux]
zdennis@lima:~$ ruby-1.8.4 temp.rb
      user system total real
  0.100000 0.000000 0.100000 ( 0.119403)
  1.570000 0.040000 1.610000 ( 1.760446)

Benchmark Linux ruby 1.8.3 (2005-06-23) [i486-linux]
zdennis@lima:~$ ruby temp.rb
      user system total real
  0.160000 0.030000 0.190000 ( 0.209436)
  1.720000 0.080000 1.800000 ( 2.021754)

Benchmark Linux ruby 1.8.2 (2005-04-11) [i386-linux]
zdennis@jboss:~$ ruby temp.rb
      user system total real
  0.000000 0.000000 0.000000 ( 0.246239)
  0.000000 0.000000 0.000000 ( 1.401049)

Zach

Excellent info Zach. Very relevant for me. I'll have thousands of
links to do this with.

Thanks again,
Clint

Dominik_Bathon · 6 April 2006 00:45

Hi,

Excellent info Zach. Very relevant for me. I'll have thousands of
links to do this with.

$ cat str_inc_bench.rb
require 'benchmark'

url = "http://www.url.com/"
url2 = "http://www.url.com/page"
urlrx = /#{url}/

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ urlrx } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
$ ruby -v str_inc_bench.rb
ruby 1.8.4 (2005-12-24) [i686-linux]
       user system total real
   0.070000 0.000000 0.070000 ( 0.071435)
   0.130000 0.000000 0.130000 ( 0.130016)
   1.130000 0.020000 1.150000 ( 1.182629)

So, regular expression matching itself is not that much slower than String#include?.
What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp objects.

I just wanted to point that out.

Dominik

···

On Thu, 06 Apr 2006 01:23:31 +0200, Clint Pidlubny <clint.pidlubny@gmail.com> wrote:

Brian_Mitchell · 6 April 2006 01:14

Ruby has some very subtle optimizations for Regexps too:

# ruby 1.8.4 (2006-03-20) [powerpc-darwin8.5.0]
GC.disable
n = ObjectSpace.each_object(Regexp){}
def foo; /abc/ end
# Note I didn't call foo.
ObjectSpace.each_object(Regexp){} - n #=> 1
1000.times {foo}
ObjectSpace.each_object(Regexp){} - n #=> 1

It is always nice to see simple optimizations like this.

Brian.

···

On 4/5/06, Dominik Bathon <dbatml@gmx.de> wrote:

So, regular expression matching itself is not that much slower than
String#include?.
What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp
objects.

Topic		Replies	Views
Match string from possible strings ruby-talk	8	119	15 April 2008
String.match on variable holding a regex? ruby-talk	2	111	23 August 2007
Checking if a string matches a regexp - am I missing something? ruby-talk	10	109	14 January 2007
Regular Expression Help ruby-talk	5	117	6 October 2012
Search for a string in another string allowing mismatches ruby-talk	3	139	26 September 2010

Compare to strings

Related topics