Break apart a string by kind of characters

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

···

--
Posted via http://www.ruby-forum.com/.

Daniel Waite wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

I did a search on the forums and came up with this regex:

'a1000aa'.scan(/((.)\2*)/).map { |i| i[0] }

Which is pretty close, but it groups on a change of character, so I
would get:

[ 'a', '1', '000', 'aa' ]

I tried playing around with the regex (e.g. swapping the . for (\d|\w))
but to no avail.

Any ideas?

I figured out one possible solution. Granted, it's not as elegant as a
single regex, but it works and I understand it. Here goes...

First, I opened up class String to add some convenience and make things
a bit shorter:

class String

  def letter?
    self.first.scan(/[A-Za-z]/).empty? ? false : true
  end

  def digit?
    self.first.scan(/[0123456789]/).empty? ? false : true
  end

end

Any my method:

def break_apart_rule_increment
  groups = Array.new
  string = 'a1000aa'

  string.each_char do |character|
    # Put the first character into a group.
    groups << character and next if groups.empty?

    # If this character is of the same kind as the last,
    # add it to the group, otherwise, create a new group
    # and put it there.
    if (groups.last.letter? and character.letter?) or
(groups.last.digit? and character.digit?)
      groups.last << character
    else
      groups << character
    end
  end

  groups
end

···

--
Posted via http://www.ruby-forum.com/\.

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

···

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

Or, if you want multiple types of character groupings:

irb(main):001:0> s = 'hello world, you crazy world!'
=> "hello world, you crazy world!"

irb(main):003:0> s.scan( /[aeiou]+|[b-df-hj-np-tv-z]+|[^a-z]+/ )
=> ["h", "e", "ll", "o", " ", "w", "o", "rld", ", ", "y", "ou", " ",
"cr", "a", "zy", " ", "w", "o", "rld", "!"]

···

On Sep 27, 11:11 am, Phrogz <phr...@mac.com> wrote:

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

> Hi all, I've an interesting problem. Imagine the following string:

> 'a1000aa'

> I want to break it apart like so:

> [ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin Kistner wrote:

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

Thanks, Gavin!

···

--
Posted via http://www.ruby-forum.com/\.

Gavin Kistner wrote:

···

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin, how in the WORLD does this bit of black magic work and how did
you ever figure it out???
--
Posted via http://www.ruby-forum.com/\.

Gavin Kistner wrote:

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

WOW! Freakin' awesome!

One caveat...

irb(main):004:0> '11aa1000aaa'.split(/(\d+)/)
=> ["", "11", "aa", "1000", "aaa"]

For some reason it answers with a blank element, but I'm sure that's an
easy one to solve.

If you just want digits and non-digits, I suggest:

>> '11aa1000aaa'.scan(/\D+|\d+/)
=> ["11", "aa", "1000", "aaa"]

James Edward Gray II

···

On Sep 27, 2007, at 7:05 PM, Daniel Waite wrote:

Gavin Kistner wrote:

Hi all, I've an interesting problem. Imagine the following string:

'a1000aa'

I want to break it apart like so:

[ 'a', '1000', 'aa' ]

irb(main):001:0> s = 'a1000aa'
=> "a1000aa"
irb(main):002:0> s.split( /(\d+)/ )
=> ["a", "1000", "aa"]

Gavin,

I'm not Gavin, but...

how in the WORLD does this bit of black magic work

Captures in a Regexp passed to split() are returned as part of the result.

and how did you ever figure it out???

Interestingly, the documentation doesn't seem to mention it. I guess I knew it was there because Perl works the same way and I tried it sometime.

James Edward Gray II

···

On Sep 28, 2007, at 6:09 AM, Lloyd Linklater wrote:

On Sep 27, 9:55 am, Daniel Waite <rabbitb...@gmail.com> wrote:

James Gray wrote:

If you just want digits and non-digits, I suggest:

>> '11aa1000aaa'.scan(/\D+|\d+/)
=> ["11", "aa", "1000", "aaa"]

I LOVE it! I gotta brush up on my regex skills. Wait, I need to get some
regex skills first. :slight_smile:

Thanks, Edward; that made my night.

···

--
Posted via http://www.ruby-forum.com/\.

I'm not Gavin, but...

Ditto

Interestingly, the documentation doesn't seem to mention it. I guess
I knew it was there because Perl works the same way and I tried it
sometime.

Ditto

James Edward Gray II

Not ditto

···

On Sep 28, 7:23 am, James Edward Gray II <ja...@grayproductions.net> wrote:

--
-yossef