Split a string at a certain character

Matt_Slay · 4 December 2010 19:31

I am writig an app where the user will enter a string and then I need to
split that string into two parts and then I can process the two parts
from there and show them some output.

I need help with the best Ruby way to split the string into the two
parts...

Here are examples of some typical user inputs, and then what it must be
split into for further processing. Hopefully you will see a pattern to
it and can help me. (Note: sometimes there will be a space between the
two parts, but some user may omit the space between [part 1] and [part
2], so I must handle both cases)

(They will not enter the quotes, I just did it for clarity)

Examples (several to help you see the pattern):

"80e6" must split to "80" and "e6"

"80 e6" must split to "80" and "e6"

"12.5H7" must split to "12.5" and "H7

"120 JS11" must split to "120" and "JS11"

"20.8a11" must split to "20.8" and "a11"

"45.50 h2" must split to "45.50" and "h2"

"90.2F3" must split to "90.2" and "F3"

"45js4" must split to "45" and "js4"

Here is the basic pattern, in words:

[part 1] followed by [part 2]

which is to say:

[part 1 = an integer or floating point number] followed by [part 2 = a
single or double set of letters (a-z or A-Z), which is then folled by an
integer]

If there is a space between [part 1] and [part 2], it needs to be
ignored.

If you are really interested in what all this is for, you can read on...
(it may help you see the overall picture.

Eventually, [part 1] will be converted to a floating point number, and
[part 2] will be used to look up some other floating point number in a
database table which is then used as a variance amount that will be
applied to [part 1].

···

--
Posted via http://www.ruby-forum.com/.

Peter_Vandenabeele1 · 4 December 2010 20:16

Pick Axe page 70-75:

ruby-1.9.2-head > s = '12.7 AB36'
=> "12.7 AB36"
ruby-1.9.2-head > pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
=> /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
ruby-1.9.2-head > s.match pattern
=> #<MatchData "12.7 AB36" 1:"12.7" 2:"AB36">
ruby-1.9.2-head > $1
=> "12.7"
ruby-1.9.2-head > $2
=> "AB36"

HTH,

Peter

···

On Sat, Dec 4, 2010 at 8:31 PM, Matt Slay <mattslay@jordanmachine.com> wrote:

[part 1] followed by [part 2]

which is to say:

[part 1 = an integer or floating point number] followed by [part 2 = a
single or double set of letters (a-z or A-Z), which is then folled by an
integer]

If there is a space between [part 1] and [part 2], it needs to be
ignored.

Josh_Cheek · 4 December 2010 20:29

This meets all of your cases, though none of them include negative signs at
the beginning, and you haven't specified how it should behave for bad data
(ie either malformed, or not as prestine as the inputs you've supplied).

require 'test/unit'

def mysplit(str)
str[/^(\d+(?:\.\d+)?)\s*([^$]*)$/]
return $1 , $2
end

class MysplitTester < Test::Unit::TestCase
  def test_1
    assert_equal [ "80" , "e6" ] , mysplit( "80e6" )
  end
  def test_2
    assert_equal [ "80" , "e6" ] , mysplit( "80 e6" )
  end
  def test_3
    assert_equal [ "12.5" , "H7" ] , mysplit( "12.5H7" )
  end
  def test_4
    assert_equal [ "120" , "JS11" ] , mysplit( "120 JS11" )
  end
  def test_5
    assert_equal [ "20.8" , "a11" ] , mysplit( "20.8a11" )
  end
  def test_6
    assert_equal [ "45.50" , "h2" ] , mysplit( "45.50 h2" )
  end
  def test_7
    assert_equal [ "90.2" , "F3" ] , mysplit( "90.2F3" )
  end
  def test_8
    assert_equal [ "45" , "js4" ] , mysplit( "45js4" )
  end
end

···

On Sat, Dec 4, 2010 at 1:31 PM, Matt Slay <mattslay@jordanmachine.com>wrote:

I am writig an app where the user will enter a string and then I need to
split that string into two parts and then I can process the two parts
from there and show them some output.

I need help with the best Ruby way to split the string into the two
parts...

Here are examples of some typical user inputs, and then what it must be
split into for further processing. Hopefully you will see a pattern to
it and can help me. (Note: sometimes there will be a space between the
two parts, but some user may omit the space between [part 1] and [part
2], so I must handle both cases)

(They will not enter the quotes, I just did it for clarity)

Examples (several to help you see the pattern):

"80e6" must split to "80" and "e6"

"80 e6" must split to "80" and "e6"

"12.5H7" must split to "12.5" and "H7

"120 JS11" must split to "120" and "JS11"

"20.8a11" must split to "20.8" and "a11"

"45.50 h2" must split to "45.50" and "h2"

"90.2F3" must split to "90.2" and "F3"

"45js4" must split to "45" and "js4"

Here is the basic pattern, in words:

[part 1] followed by [part 2]

which is to say:

[part 1 = an integer or floating point number] followed by [part 2 = a
single or double set of letters (a-z or A-Z), which is then folled by an
integer]

If there is a space between [part 1] and [part 2], it needs to be
ignored.

If you are really interested in what all this is for, you can read on...
(it may help you see the overall picture.

Eventually, [part 1] will be converted to a floating point number, and
[part 2] will be used to look up some other floating point number in a
database table which is then used as a variance amount that will be
applied to [part 1].

--
Posted via http://www.ruby-forum.com/\.

Matt_Slay · 4 December 2010 22:14

Pick Axe page 70-75:

ruby-1.9.2-head > s = '12.7 AB36'
=> "12.7 AB36"
ruby-1.9.2-head > pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
=> /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
ruby-1.9.2-head > s.match pattern
=> #<MatchData "12.7 AB36" 1:"12.7" 2:"AB36">
ruby-1.9.2-head > $1
=> "12.7"
ruby-1.9.2-head > $2
=> "AB36"

HTH,

Peter

Peter!!!! You are *THE* man!!! I actually just bought the 1.9.2 version
of the book for just $10. Regular Expressions are now on page 97
(Chapter 7). I'm an FoxPro programmer and just beginning my work in
Ruby.

I threw your pattern and match command into a Controller action in my
Rails app, and guess what.... It works!!

Here's my code, after just 1 minute of work, thanks to you. (Some
refinements are now needed for stupid user input, but I'll get.

I'm so excited to see this come to life so easily. The Ruby community is
awesome.

  def create
    @conversion = Conversion.new(params[:conversion])
    pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
    user_input = @conversion.shaft_size
    @size = user_input.match pattern

    respond_to do |format|
      format.html { render :show }
    end
  end

Now I can access the @size[1] and @size[2] to get what I need done.

How many times can I say "Thanks"???

···

--
Posted via http://www.ruby-forum.com/\.

Matt_Slay · 4 December 2010 22:20

Josh Cheek wrote in post #966218:

def mysplit(str)
str[/^(\d+(?:\.\d+)?)\s*([^$]*)$/]
return $1 , $2
end

Josh - Well, this looks both simple and powerful as well. You all are
obviously some smart Ruby experts.

I'm gonna test your regular expression and compare it to the one from
Peter to see if I can see the fine points between them both.

Thanks for taking the time to respond.

I especially like that testing thing you showed me. I need to study that
as well.

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 5 December 2010 12:50

There are a few things to say about this solution though:

The float part does not exactly match floating point numbers but it would also match these sequences: 127.0.0.1, 1..10, 4...3 etc.

"[0-9]" can be replaced by "\d".

Assigning the pattern to a variable and then using that for matching is generally less efficient than directly using the pattern.

The pattern "\Z" allows for a newline at the end of the string. This may be OK or not (you can also use #chomp on the input to remove trailing line terminators before handing the input to the method).

I would rather go with a combination of Peter's and Josh's solution:

def parse(str)
   if /\A(\d+(?:\.\d+)?)\s*([a-z]{1,2}\d+)\z/i =~ str
     return Float($1), $2
   else
     raise "Invalid input: %p" % str
   end
end

If you need to cope with negative floats and signs in general you can change the initial part to

\A([-+]?\d+(?:\.\d+)?)

Few notes and explanations:

The code does the conversion inside the method which may be desirable or not - depending on your context.

I picked Float() for added robustness, it will raise an exception if the matching was flawed (which should not be the case here).

I added error checking with exception.

Btw, your original description of the sequence is pretty good for direct translation into a regular expression.

Kind regards

robert

···

On 04.12.2010 23:14, Matt Slay wrote:

Pick Axe page 70-75:

ruby-1.9.2-head> s = '12.7 AB36'
  => "12.7 AB36"
ruby-1.9.2-head> pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
  => /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
ruby-1.9.2-head> s.match pattern
  => #<MatchData "12.7 AB36" 1:"12.7" 2:"AB36">
ruby-1.9.2-head> $1
  => "12.7"
ruby-1.9.2-head> $2
  => "AB36"

Peter!!!! You are *THE* man!!! I actually just bought the 1.9.2 version
of the book for just $10. Regular Expressions are now on page 97
(Chapter 7). I'm an FoxPro programmer and just beginning my work in
Ruby.

I threw your pattern and match command into a Controller action in my
Rails app, and guess what.... It works!!

Here's my code, after just 1 minute of work, thanks to you. (Some
refinements are now needed for stupid user input, but I'll get.

I'm so excited to see this come to life so easily. The Ruby community is
awesome.

   def create
     @conversion = Conversion.new(params[:conversion])
     pattern = /\A([0-9.]+)\s*([a-zA-Z]{1,2}[0-9]+)\Z/
     user_input = @conversion.shaft_size
     @size = user_input.match pattern

     respond_to do |format|
       format.html { render :show }
     end
   end

Now I can access the @size[1] and @size[2] to get what I need done.

How many times can I say "Thanks"???

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Peter_Vandenabeele1 · 5 December 2010 20:55

...

There are a few things to say about this solution though:

Indeed ... I was showing a lazily coded quick first approach (and a
hint to the documentation). Thanks for all the proposed improvements.

Peter

···

On Sun, Dec 5, 2010 at 1:50 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

Matt_Slay · 14 December 2010 22:15

I would rather go with a combination of Peter's and Josh's solution:

def parse(str)
   if /\A(\d+(?:\.\d+)?)\s*([a-z]{1,2}\d+)\z/i =~ str
     return Float($1), $2
   else
     raise "Invalid input: %p" % str
   end
end

  robert

Thanks, all.. This has greatly solved many problems for me.

However, I've notice that the server console is spitting out this
message below with every request hit that comes in:

warning: nested repeat operator + and ? was replaced with '*':
/\A((?:\d+)?(?:\.\d+)?)\s*([a-z]{1,2}\d+)\z/

The whole process works fine, and but I am confused as to what this
means.

Also, one more question... what does that "i" mean at the very end of
the regex pattern? "/...../i =! str"

Is that what tells it to dump the results into the $1 and $2 variables?

···

--
Posted via http://www.ruby-forum.com/\.

Brian_Candler · 15 December 2010 15:21

Matt Slay wrote in post #968427:

However, I've notice that the server console is spitting out this
message below with every request hit that comes in:

warning: nested repeat operator + and ? was replaced with '*':
/\A((?:\d+)?(?:\.\d+)?)\s*([a-z]{1,2}\d+)\z/

The whole process works fine, and but I am confused as to what this
means.

\d+ means "a digit, one or more times"

(?: ... ) makes a non-capturing group

(?: ... )? means "this group 0 or 1 times"

So the part of your regexp where you have

(?:\d+)?

says "one or more digits, 0 or more times". Ruby is pointing out that
this is a convoluted way of saying "0 or more digits", and is optimising
it to:

\d*

(where '*' means 0 or more times)

···

--
Posted via http://www.ruby-forum.com/\.

Matt_Slay · 15 December 2010 15:45

\d+ means "a digit, one or more times"

(?: ... ) makes a non-capturing group

(?: ... )? means "this group 0 or 1 times"

So the part of your regexp where you have

(?:\d+)?

says "one or more digits, 0 or more times". Ruby is pointing out that
this is a convoluted way of saying "0 or more digits", and is optimising
it to:

\d*

(where '*' means 0 or more times)

I see now. I made this change and the message has gone away.

Thanks fo taking the time to give such a thorough explanation.

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Split ruby-talk	1	102	22 July 2010
Split string at spaces, but not when inside quotation marks? ruby-talk	6	150	20 March 2008
Can't understand String#split's behavior ruby-talk	2	144	17 October 2010
Splitting a string ruby-talk	4	89	28 June 2007
String.split ruby-talk	13	91	14 July 2004

Split a string at a certain character

Related topics