Non case sensitive searching

Adam_Akhtar · 11 February 2008 21:19

If i want to see if a list contains a particular word how would i go
about doing so with letter case not being important. Normally two words
spelt the same are not equal if they differ in case.

i.e. Shell != shell
or ShEll != shell

At first i thought simply conveting both the list and the word to search
into uppercase thus making them the same but doing so would not be very
efficient if the list is very big.

Any ideas??

···

--
Posted via http://www.ruby-forum.com/.

Stefano_Crocco · 11 February 2008 21:24

Alle Monday 11 February 2008, Adam Akhtar ha scritto:

If i want to see if a list contains a particular word how would i go
about doing so with letter case not being important. Normally two words
spelt the same are not equal if they differ in case.

i.e. Shell != shell
or ShEll != shell

At first i thought simply conveting both the list and the word to search
into uppercase thus making them the same but doing so would not be very
efficient if the list is very big.

Any ideas??

Use String#casecmp. According to ri documentation, it works as String#<=> but
is not case sensitive. For example:

"abc" <=> "Abc"
=> 1

"abc".casecmp "Abc"
=> 0

Stefano

Robert_K1 · 11 February 2008 21:29

search_key = something.downcase
list.find {|s| s.downcase == search_key}

robert

···

On 11.02.2008 22:19, Adam Akhtar wrote:

If i want to see if a list contains a particular word how would i go
about doing so with letter case not being important. Normally two words
spelt the same are not equal if they differ in case.

i.e. Shell != shell
or ShEll != shell

At first i thought simply conveting both the list and the word to search
into uppercase thus making them the same but doing so would not be very
efficient if the list is very big.

7stud · 11 February 2008 21:31

Adam Akhtar wrote:

If i want to see if a list contains a particular word how would i go
about doing so with letter case not being important. Normally two words
spelt the same are not equal if they differ in case.

i.e. Shell != shell
or ShEll != shell

At first i thought simply conveting both the list and the word to search
into uppercase thus making them the same but doing so would not be very
efficient if the list is very big.

Any ideas??

data = ['ShEll', 'heLLO']

data.each do |word|
  if /shell/i =~ word
    puts 'found shell'
  end

  if /hello/i =~ word
    puts 'found hello'
  end
end

···

--
Posted via http://www.ruby-forum.com/\.

W_James · 11 February 2008 21:44

%w(hello HELLO Hello hellO hell).grep( /hello/i )
==>["hello", "HELLO", "Hello", "hellO"]

···

On Feb 11, 3:19 pm, Adam Akhtar <adamtempor...@gmail.com> wrote:

If i want to see if a list contains a particular word how would i go
about doing so with letter case not being important. Normally two words
spelt the same are not equal if they differ in case.

i.e. Shell != shell
or ShEll != shell

At first i thought simply conveting both the list and the word to search
into uppercase thus making them the same but doing so would not be very
efficient if the list is very big.

irb --prompt xmp

Adam_Akhtar · 11 February 2008 23:08

the reg exp. looks good but when i try to apply it to help me romove
duplicates from a list it doesnt seem to work

list = %w{adam Adam bobby Bobby wild wILd}
list.sort
list.each_index do |x|
list.delete_at(x) if (/list[x]/i =~ list[x+1]) #remove entries which
have same spelling but in diff. case
end
puts ""
puts list

in this list i consider adam and Adam duplicates because they have the
same spelling but only different case. Why doesnt my if clause pick up
on this?

···

--
Posted via http://www.ruby-forum.com/.

Alex_Fenton2 · 11 February 2008 23:19

Adam Akhtar wrote:

the reg exp. looks good but when i try to apply it to help me romove duplicates from a list it doesnt seem to work

list = %w{adam Adam bobby Bobby wild wILd}
list.sort
list.each_index do |x|
list.delete_at(x) if (/list/i =~ list[x+1])

here you're literally searching for the text "list", not the value of that expression. You need to use interpolation to place the value of that variable into the regular expression. Use #{expr}, like you would in a string:

list.delete_at(x) if ( /#{list}/i =~ list[x+1])

If your strings might contain punctuation characters, you should also look into Regexp.escape to ensure these are dealt with safely.

alex

Robert_K1 · 12 February 2008 07:45

the reg exp. looks good but when i try to apply it to help me romove
duplicates from a list it doesnt seem to work

list = %w{adam Adam bobby Bobby wild wILd}
list.sort

The line above is ineffective because you do not sort the original list.

list.each_index do |x|
list.delete_at(x) if (/list/i =~ list[x+1]) #remove entries which
have same spelling but in diff. case
end
puts ""
puts list

I would use another algorithm because of efficiency:

#!/bin/env ruby

require 'set'

# ensure random order
list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
dups = Set.new

p list
list.delete_if {|w| not dups.add? w.downcase }
p list

Kind regards

robert

···

2008/2/12, Adam Akhtar <adamtemporary@gmail.com>:

--
use.inject do |as, often| as.you_can - without end

Adam_Akhtar · 12 February 2008 00:04

excellent thanks very much alex for that.

···

--
Posted via http://www.ruby-forum.com/.

W_James · 12 February 2008 08:34

We don't need sets for this.

list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
==>["wild", "bobby", "Bobby", "wILd", "Adam", "adam"]
list.map{|x| x.upcase}.uniq
==>["WILD", "BOBBY", "ADAM"]

First "inject"; now, a set fetish?

···

On Feb 12, 1:45 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/2/12, Adam Akhtar <adamtempor...@gmail.com>:

> the reg exp. looks good but when i try to apply it to help me romove
> duplicates from a list it doesnt seem to work

> list = %w{adam Adam bobby Bobby wild wILd}
> list.sort

The line above is ineffective because you do not sort the original list.

> list.each_index do |x|
> list.delete_at(x) if (/list/i =~ list[x+1]) #remove entries which
> have same spelling but in diff. case
> end
> puts ""
> puts list

I would use another algorithm because of efficiency:

#!/bin/env ruby

require 'set'

# ensure random order
list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
dups = Set.new

p list
list.delete_if {|w| not dups.add? w.downcase }
p list

7stud · 12 February 2008 09:01

Adam Akhtar wrote:

the reg exp. looks good

You should never consider a regex good looking. regexes should be
avoided whenever possible in favor of String methods. Better solutions
have been posted.

Robert Klemme wrote:

I would use another algorithm because of efficiency:

list.delete_if

Repeatedly deleting elements from the middle of an array is certainly
not efficient. Also, suppose the results are:

[Adam Bobby wILd]

Looking at the results, you would have no way of knowing whether there
were duplicates spelled: adam, bobby, and wild. Therefore, the case of
the results appears to be irrelevant. If the case of the results is
irrelevant, then just providing the set is enough:

require 'set'

# ensure random order
list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }

results = Set.new
list.each do |elmt|
results << elmt.downcase
end

p results

--output:--
#<Set: {"bobby", "wild", "adam"}>

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 12 February 2008 09:39

As far as I can see the requirement was to remove duplicates and not
to output a uniform cased list.

Cheers

robert

···

2008/2/12, William James <w_a_x_man@yahoo.com>:

On Feb 12, 1:45 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> 2008/2/12, Adam Akhtar <adamtempor...@gmail.com>:
>
> > the reg exp. looks good but when i try to apply it to help me romove
> > duplicates from a list it doesnt seem to work
>
> > list = %w{adam Adam bobby Bobby wild wILd}
> > list.sort
>
> The line above is ineffective because you do not sort the original list.
>
> > list.each_index do |x|
> > list.delete_at(x) if (/list/i =~ list[x+1]) #remove entries which
> > have same spelling but in diff. case
> > end
> > puts ""
> > puts list
>
> I would use another algorithm because of efficiency:
>
> #!/bin/env ruby
>
> require 'set'
>
> # ensure random order
> list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
> dups = Set.new
>
> p list
> list.delete_if {|w| not dups.add? w.downcase }
> p list

We don't need sets for this.

list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
==>["wild", "bobby", "Bobby", "wILd", "Adam", "adam"]
list.map{|x| x.upcase}.uniq
==>["WILD", "BOBBY", "ADAM"]

--
use.inject do |as, often| as.you_can - without end

Robert_K1 · 12 February 2008 09:48

Adam Akhtar wrote:
> the reg exp. looks good

You should never consider a regex good looking. regexes should be
avoided whenever possible in favor of String methods. Better solutions
have been posted.

I would not subscribe to that rule. Often regular expressions are
faster than pure String based approaches - it depends on the issue at
hand.

Robert Klemme wrote:
> I would use another algorithm because of efficiency:
>
> list.delete_if

Repeatedly deleting elements from the middle of an array is certainly
not efficient. Also, suppose the results are:

[Adam Bobby wILd]

Looking at the results, you would have no way of knowing whether there
were duplicates spelled: adam, bobby, and wild. Therefore, the case of
the results appears to be irrelevant. If the case of the results is
irrelevant, then just providing the set is enough:

Alternatively one could use a Hash to preserve all spellings:

irb(main):001:0> list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
=> ["wild", "wILd", "Adam", "adam", "Bobby", "bobby"]
irb(main):002:0> res = Hash.new {|h,k| h[k]=}
=> {}
irb(main):003:0> list.each {|w| res[w.downcase] << w}
=> ["wild", "wILd", "Adam", "adam", "Bobby", "bobby"]
irb(main):004:0> res
=> {"bobby"=>["Bobby", "bobby"], "wild"=>["wild", "wILd"],
"adam"=>["Adam", "adam"]}
irb(main):005:0>

It all depends...

Cheers

robert

···

2008/2/12, 7stud -- <bbxx789_05ss@yahoo.com>:

--
use.inject do |as, often| as.you_can - without end

W_James · 12 February 2008 10:54

1. My solution removed duplicates.
2. To output a uniform-cased list does no harm since
case is irrelevant.

Using plain Ruby is shorter, easier, and clearer.
There's no rational reason to require sets.

Learn to use Ruby. It's a very powerful language;
so powerful, in fact, that I seldom have to use
external libraries.

···

On Feb 12, 3:39 am, Robert Klemme <shortcut...@googlemail.com> wrote:

2008/2/12, William James <w_a_x_...@yahoo.com>:

> On Feb 12, 1:45 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> > 2008/2/12, Adam Akhtar <adamtempor...@gmail.com>:

> > > the reg exp. looks good but when i try to apply it to help me romove
> > > duplicates from a list it doesnt seem to work

> > > list = %w{adam Adam bobby Bobby wild wILd}
> > > list.sort

> > The line above is ineffective because you do not sort the original list.

> > > list.each_index do |x|
> > > list.delete_at(x) if (/list/i =~ list[x+1]) #remove entries which
> > > have same spelling but in diff. case
> > > end
> > > puts ""
> > > puts list

> > I would use another algorithm because of efficiency:

> > #!/bin/env ruby

> > require 'set'

> > # ensure random order
> > list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
> > dups = Set.new

> > p list
> > list.delete_if {|w| not dups.add? w.downcase }
> > p list

> We don't need sets for this.

> list = %w{adam Adam bobby Bobby wild wILd}.sort_by { rand }
> ==>["wild", "bobby", "Bobby", "wILd", "Adam", "adam"]
> list.map{|x| x.upcase}.uniq
> ==>["WILD", "BOBBY", "ADAM"]

As far as I can see the requirement was to remove duplicates and not
to output a uniform cased list.

James_Edward_Gray_II · 12 February 2008 13:47

Or manners, it would seem.

James Edward Gray II

···

On Feb 12, 2008, at 4:54 AM, William James wrote:

Learn to use Ruby. It's a very powerful language;
so powerful, in fact, that I seldom have to use
external libraries.

Topic		Replies	Views
Case sensitive strings ruby-talk	3	101	21 May 2009
"str1" == "STR1" case insensitive ruby-talk	7	73	22 March 2006
String: case insensitive comparison ruby-talk	2	78	18 May 2004
Case-insensitive string compare? ruby-talk	0	99	26 August 2002
How to make stopwords case insensitive ruby-talk	1	142	27 February 2010

Non case sensitive searching

Related topics