Break apart a string by kind of characters

Daniel_Waite · 27 September 2007 15:55

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

···

--
Posted via http://www.ruby-forum.com/.

Daniel_Waite · 27 September 2007 16:31

Daniel Waite wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

I figured out one possible solution. Granted, it's not as elegant as a
single regex, but it works and I understand it. Here goes...

First, I opened up class String to add some convenience and make things
a bit shorter:

class String

  def letter?
    self.first.scan(/[A-Za-z]/).empty? ? false : true
  end

  def digit?
    self.first.scan(/[0123456789]/).empty? ? false : true
  end

end

Any my method:

def break_apart_rule_increment
groups = Array.new
string = 'a1000aa'

  string.each_char do |character|
    # Put the first character into a group.
    groups << character and next if groups.empty?

    # If this character is of the same kind as the last,
    # add it to the group, otherwise, create a new group
    # and put it there.
    if (groups.last.letter? and character.letter?) or
(groups.last.digit? and character.digit?)
      groups.last << character
    else
      groups << character
    end
  end

groups
end

···

--
Posted via http://www.ruby-forum.com/\.

Gavin_Kistner3 · 27 September 2007 17:15

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

···

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

Gavin_Kistner3 · 27 September 2007 17:40

Or, if you want multiple types of character groupings:

irb(main):001:0> s = 'hello world, you crazy world!'
=> "hello world, you crazy world!"

irb(main):003:0> s.scan( /[aeiou]+|[b-df-hj-np-tv-z]+|[^a-z]+/ )
=> ["h", "e", "ll", "o", " ", "w", "o", "rld", ", ", "y", "ou", " ",
"cr", "a", "zy", " ", "w", "o", "rld", "!"]

···

On Sep 27, 11:11 am, Phrogz <phr...@mac.com> wrote:

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

> Hi all, I've an interesting problem. Imagine the following string:

> 'a1000aa'

> I want to break it apart like so:

> [ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Daniel_Waite · 28 September 2007 00:05

Gavin Kistner wrote:

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

Thanks, Gavin!

···

--
Posted via http://www.ruby-forum.com/\.

Lloyd_Linklater · 28 September 2007 11:09

Gavin Kistner wrote:

···

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin, how in the WORLD does this bit of black magic work and how did
you ever figure it out???
--
Posted via http://www.ruby-forum.com/\.

James_Edward_Gray_II · 28 September 2007 00:08

Gavin Kistner wrote:

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

If you just want digits and non-digits, I suggest:

>> '11aa1000aaa'.scan(/\D+|\d+/)
=> ["11", "aa", "1000", "aaa"]

James Edward Gray II

···

On Sep 27, 2007, at 7:05 PM, Daniel Waite wrote:

James_Edward_Gray_II · 28 September 2007 12:23

Gavin Kistner wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin,

I'm not Gavin, but...

how in the WORLD does this bit of black magic work

Captures in a Regexp passed to split() are returned as part of the result.

and how did you ever figure it out???

Interestingly, the documentation doesn't seem to mention it. I guess I knew it was there because Perl works the same way and I tried it sometime.

James Edward Gray II

···

On Sep 28, 2007, at 6:09 AM, Lloyd Linklater wrote:

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

Daniel_Waite · 28 September 2007 06:33

James Gray wrote:

If you just want digits and non-digits, I suggest:

>> '11aa1000aaa'.scan(/\D+|\d+/)
=> ["11", "aa", "1000", "aaa"]

I LOVE it! I gotta brush up on my regex skills. Wait, I need to get some
regex skills first.

Thanks, Edward; that made my night.

···

--
Posted via http://www.ruby-forum.com/\.

Yossef_Mendelssohn · 28 September 2007 13:07

I'm not Gavin, but...

Ditto

Interestingly, the documentation doesn't seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

Ditto

James Edward Gray II

Not ditto

···

On Sep 28, 7:23 am, James Edward Gray II <ja...@grayproductions.net> wrote:

--
-yossef

Topic		Replies	Views
Split a string based on change of character ruby-talk	26	180	14 August 2007
Splitting string into array keeping delimiters ruby-talk	6	146	16 December 2007
What's the best way to split this kind of string? ruby-talk	5	136	13 March 2006
A simple command that splits up a string into numbers and letters ruby-talk	9	188	5 February 2009
Need to split string into letters and numbers ruby-talk	6	128	14 May 2009

Break apart a string by kind of characters

Related topics