Afternoon,
Kendall Gifford wrote in post #958838:
>> You are going to need some more advanced regexes to get a match in the
way
>> /^A?B?C?D{0,2}$/
>>
>
> The above works so long as each of ABC or D must come in said order,
> if present. This goes against the OP's examples: CA, DAD, and BAC.
>
> My suspicion is that you're problem isn't solvable by a regular
> expression alone, but that you'll need to do some parsing (still
> possibly using regular expressions in the process).
Yes, you are correct, this expression doesnt match words like CA or BAD
or CAD.
Just wondering if there is an option to say all different combinations
or orders.
Unless for some reason you are doing homework or something and need to use a
regular expression, may I suggest the following instead.
You have a very small string and a very small set of options in terms of
possible letters. Your string can have no more than 5 different characters
so why not create a "binary" version of your string and then check to see if
anything appears more than you want?
For example
chars = {}
# Why the powers jump by 5 explanation follows
chars['a'] = 2**0
chars['b'] = 2**5
chars['c'] = 2**10
chars['d'] = 2**15
test_string = 'aabcd'
bit_value = 0
test_string.downcase.each_char{ |c| bit_value = bit_value + chars[c] }
if (bit_value & 949214) != 0 #Magic number explanation to follow
puts 'Bad string'
else
puts 'Accepted string'
end
Really what this does is goes character by character and puts each character
into it's "bucket"
So we start with a binary value of all zeros
00000 00000 00000 00000 - we use 5 slots (or increase our power of 2 by 5)
for each character because they could appear 5 times each
Going through the test_string - 'aabcd'
First an a - we add 2**0 which is 1
0+1 = 1
Binary
00000 00000 00000 00001
Next another a - add another 1
1+1 = 2
Binary
00000 00000 00000 00010
b is next - add 2**5 or 32
2+32 = 34
Binary
00000 00000 00001 00010
c - add 2**10 or 1024
34+1024 = 1058
Binary
00000 00001 00001 00010
d - add 2**15 or 32768
Binary
00001 00001 00001 00010
It should be fairly apparent that the location of the 1 in each "bucket"
represents the number of times that character appears in our string.
If you had aaabb then your binary value would be 00000 00000 00010 00100
dddcd would look like 01000 00001 00000 00000
And so on - obviously if we have none of a character then we have no 1 in
that "bucket"
So lets now take a look at the "magic" number of 949214 which has a binary
rep of
11100 11110 11110 11110
This number represents all the possible locations of the binary digit one
that you would find unacceptable. Either 5,4, or 3 Ds and 5, 4, 3, or 2 Cs,
Bs, or As.
So now we have
00001 00001 00001 00010 - test value
11100 11110 11110 11110 - our mask of unacceptable options
And when we & (bit wise AND) the two values we get
00000 00000 00000 00010 - we have too many As in this case.
Therefore since this value is above 0 we have a problem. Any valid value
will return zero here.
Not a regular expression - but I believe this would be faster given your
limited set of options and string lengths.
John
···
On Tue, Nov 2, 2010 at 3:04 PM, Dv Dasari <dv.mymail@gmail.com> wrote:
> On Tue, Nov 2, 2010 at 3:34 PM, Richard Conroy > > <richard.conroy@gmail.com> wrote: