String contains one of these?

Mikkel_Bruun1 · 27 February 2006 13:35

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine which
of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

thanks in advance

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 27 February 2006 13:42

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine which
of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]

"some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )

=> ["NL"]

Kind regards

robert

···

2006/2/27, mikkel <mikkel@helenius.dk>:

--
Have a look: Robert K. | Flickr

James_Edward_Gray_II · 27 February 2006 13:42

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine which
of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

The hard way isn't too hard and doesn't require but a line of code:

>> leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]
>> str="some stuff NL is chunky"
=> "some stuff NL is chunky"
>> leagues.find_all { |league| str.include? league }
=> ["NL"]

Hope that helps.

James Edward Gray II

···

On Feb 27, 2006, at 7:35 AM, mikkel wrote:

Daniel_Harple · 27 February 2006 13:43

How about:

leagues = %w{1D 2D U16 U19 LR RR JNL NL}
words = "some stuff NL is chunky".split
leagues.select { |m| words.include?(m) } # => ["NL"]

-- Daniel

···

On Feb 27, 2006, at 2:35 PM, mikkel wrote:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

hitesh.jasani · 28 February 2006 05:13

You've got a bunch of great answers already, but here's another option.

leagues = %w(1D 2D U16 U19 LR RR JNL NL)
words = "some stuff NL is chunky"

irb(main):008:0> words.split & leagues
=> ["NL"]

- Hitesh
http://www.jasani.org/

Christian_Neukirche1 · 27 February 2006 16:04

"Robert Klemme" <shortcutter@googlemail.com> writes:

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine which
of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]

"some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )

=> ["NL"]

irb(main):002:0> "some stuff NL is chunky".scan Regexp.union(*leagues)
=> ["NL"]

···

2006/2/27, mikkel <mikkel@helenius.dk>:

robert

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Johan_Veenstra · 28 February 2006 09:39

And the winner is ...

···

On 2/28/06, hitesh.jasani@gmail.com <hitesh.jasani@gmail.com> wrote:

You've got a bunch of great answers already, but here's another option.

leagues = %w(1D 2D U16 U19 LR RR JNL NL)
words = "some stuff NL is chunky"

irb(main):008:0> words.split & leagues
=> ["NL"]

- Hitesh
http://www.jasani.org/

Robert · 27 February 2006 17:53

Christian Neukirchen wrote:

"Robert Klemme" <shortcutter@googlemail.com> writes:

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine
which of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]

"some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )

=> ["NL"]

irb(main):002:0> "some stuff NL is chunky".scan
Regexp.union(*leagues) => ["NL"]

Even better! Didn't know about that method. Learn something new every
day. Thanks!

robert

···

2006/2/27, mikkel <mikkel@helenius.dk>:

hitesh.jasani · 28 February 2006 13:33

Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

- Hitesh
http://www.jasani.org/

BA_Baracus · 27 February 2006 21:00

amazing...

thanks a bunch everybody...

···

On Tuesday, February 28, 2006, at 2:53 AM, Robert Klemme wrote:

Christian Neukirchen wrote:

"Robert Klemme" <shortcutter@googlemail.com> writes:

2006/2/27, mikkel <mikkel@helenius.dk>:

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine
which of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]

"some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )

=> ["NL"]

irb(main):002:0> "some stuff NL is chunky".scan
Regexp.union(*leagues) => ["NL"]

Even better! Didn't know about that method. Learn something new every
day. Thanks!

robert

Mikkel Bruun

www.strongside.dk - Football Portal(DK)
nflfeed.helenius.org - Football News(DK)
ting.minline.dk - Buy Old Stuff!(DK)

--
Posted with http://DevLists.com. Sign up and save your time!

Jeff_Schwab · 28 February 2006 16:28

Is it possible that link is incorrect?

"Firefox can't establish a connection to the server at www.jasani.org."

I'm dying to learn about this now.

···

hitesh.jasani@gmail.com wrote:

Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

- Hitesh
http://www.jasani.org/

Christian_Neukirche1 · 1 March 2006 02:23

"hitesh.jasani@gmail.com" <hitesh.jasani@gmail.com> writes:

Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

You'd better cache those Regexps.
Also, test with longer "words"---they are more likely to grow than the
number of leagues.

···

- Hitesh

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

Jeff_Schwab · 1 March 2006 16:08

I am surprised by the scan failures ("could not continue test"). Do you know what causes the error?

···

hitesh.jasani@gmail.com wrote:

Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

hitesh.jasani · 1 March 2006 01:56

Jeffrey, the link should be working for you now. My hosting provider
had a number of servers go belly up earlier in the day.

hitesh.jasani · 1 March 2006 05:03

Good comments Christian. I thought I'd just hack a set of tests and
post some quick results, but all I think I did was prove that I'm not
supposed to be coding first thing in the morning.

Yes, once you fix that bug in the code by caching the Regexes, they
perform very impressively. In fact, for the modified tests I just ran,
they beat out every other solution in every case except for extremely
large league sizes (> 18,000 elements) where they wouldn't run at all.
But I'm loathe to draw any conclusions from the data just yet. (Burned
once, twice shy?)

There appears to be at least one other bug in the code. One astute,
anonymous person pointed out that the six solutions will not return the
same results for the generated datasets and that preprocessing of input
data could help improve performance even more. I think it all depends
on how one defines the problem as to whether the generated data is
valid input or undefined requirements now being levied on the code.

Mmmm.... I hope JEG II is watching this thread as I'm wondering if
there isn't a rubyquiz in here somewhere.

- Hitesh
http://www.jasani.org/

Christian_Neukirche1 · 1 March 2006 16:53

Jeffrey Schwab <jeff@schwabcenter.com> writes:

···

hitesh.jasani@gmail.com wrote:

Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

I am surprised by the scan failures ("could not continue test"). Do
you know what causes the error?

irb(main):002:0> Regexp.new "x"*600_000
RegexpError: regular expression too big: /xxxxx...

--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org

James_Edward_Gray_II · 1 March 2006 14:21

If you can think of a good way to spin it:

suggestion@rubyquiz.com

James Edward Gray II

···

On Feb 28, 2006, at 11:03 PM, hitesh.jasani@gmail.com wrote:

Mmmm.... I hope JEG II is watching this thread as I'm wondering if
there isn't a rubyquiz in here somewhere.

Topic		Replies	Views
Search string for occurneces of words stored in array ruby-talk	14	152	1 May 2008
Searching Stings with Arrays? ruby-talk	13	95	25 December 2006
String.scan - catching overlapping patterns with lookahead ruby-talk	5	124	16 December 2004
String Matching Problem ruby-talk	11	146	29 October 2009
Perl multiple match RE in Ruby? ruby-talk	13	132	25 October 2002

String contains one of these?

Related topics