Split a string based on change of character

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

  import itertools
  s = "ZBBBCCZZ"
  x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

Thanks,
/-\

···

____________________________________________________________________________________
Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.
http://au.docs.yahoo.com/mail/unlimitedstorage.html

# s = "ZBBBCZZ"
# x = s.scan(/((.)\2*)/).map {|i| i[0]}

when it comes to string patterns like this, nothing beats regex

# import itertools
# s = "ZBBBCCZZ"
# x = [''.join(g) for k, g in itertools.groupby(s)]
# Does anyone know if Ruby has a similar library to Python's itertools?

hmm, you seem to like this than your previous regex+map solution, why? (i ask because i prefer your first solution --not that it's ruby)

in 1.9 or the upcoming ruby, it keeps getting better and better and may look like this,

s = "ZBBBCZZ"
x = s.split('').group_by{|x| x}.entries

or possibly to

x = s.split('').group_by.entries

but unfortunately i don't have a 1.9 build here to test (grrr, shouldn't have deleted that vm).

kind regards -botp

···

From: Andrew Savige [mailto:ajsavige@yahoo.com.au]

Nothing off the top of my head, but how does this work for you ?

    in_str.split('').inject() do |m,l|
        if m.last and m.last[0].chr == l
            m[-1] += l
        else
            m << l
        end
        m
    end

Its not too lines, but it will return the same array

enjoy

-jeremy

···

On Sat, Aug 11, 2007 at 09:52:24AM +0900, Andrew Savige wrote:

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

  import itertools
  s = "ZBBBCCZZ"
  x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

--

Jeremy Hinegardner jeremy@hinegardner.org

Andrew Savige wrote:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

Maybe this ist faster:

result =
"ZBBBCZZ".scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Andrew Savige schrieb:

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

you may want to write it as ...map{|i,|i}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

  import itertools
  s = "ZBBBCCZZ"
  x = [''.join(g) for k, g in itertools.groupby(s)]
Does anyone know if Ruby has a similar library to Python's itertools?

No idea, here is another variant to play with:

x = /#{s.gsub(/(.)\1*/, '(\1+)')}/.match(s).captures

funny little problem.

cheers

Simon

Hi --

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

Probably not better, but just for fun, here's a way using the strscan
extension. I'd be very interested if anyone can get this to be less
clunky -- in particular, the - [""] at the end.

require 'strscan'
s = StringScanner.new("AABCCCDAAAEE")

s.string.split(//).inject() {|a,b| a << s.scan_until(/(?!#{b})/) } - [""]

=> ["AA", "B", "CCC", "D", "AAA", "EE"]

David

···

On Sat, 11 Aug 2007, Andrew Savige wrote:

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Yeah, it's short but I agree with things you dislike about it. My approach was essentially the same as Jeremy's;

   s.split(//).inject() {|g, c| (g.last && g.last[c] ? g.last : g) << c; g}

That's just playing around though, I think that approach is not better.

In my view a better idiom would be to split on character switches. That would be concise. But as you know if you put groups you get them back. I see no way to express the condition for boundaries without using groups.

-- fxn

···

On Aug 11, 2007, at 2:52 AM, Andrew Savige wrote:

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

s = "ZBBBCZZ"
    ==>"ZBBBCZZ"
s.scan( /((.)\2*)/ ).transpose.first
    ==>["Z", "BBB", "C", "ZZ"]
s.gsub( /(.)(?!\1)/, "\\1\n" ).split
    ==>["Z", "BBB", "C", "ZZ"]

···

On Aug 10, 7:52 pm, Andrew Savige <ajsav...@yahoo.com.au> wrote:

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

BTW, in Python, it can be done with a regex (similar to above) or via
their itertools library:

  import itertools
  s = "ZBBBCCZZ"
  x = [''.join(g) for k, g in itertools.groupby(s)]

Does anyone know if Ruby has a similar library to Python's itertools?

Thanks,
/-\

____________________________________________________________________________________
Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.http://au.docs.yahoo.com/mail/unlimitedstorage.html

Andrew Savige wrote:

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

Another variant which gets rid of one of the capture
groups and does introduce an artificial split character

Enumerator.new(s, :scan, /(.)\1*/).map {$&}

Note the $& will not work in the example

> x = s.scan(/((.)\2*)/).map {|i| i[0]}

because the map is run on an array after the
scan has happened. To run the map inline with
the scan you need the Enumerator object.

I doubt using Enumerator is any faster though.

Wouldn't it be nicer if scan returned an
enumerable instead of an array. We could
define

class String
    def scan_enum regexp
  Enumerator.new self, :scan, regexp
    end
end

and then be able to do

s.scan_enum(/(.)\1*/).map {$&}

···

--
Brad Phelan
http://xtargets.com

hmm, you seem to like this than your previous regex+map solution, why? (i ask
because i prefer your first solution --not that it's ruby)

Actually, I'm not super happy with either solution. :slight_smile:

The annoyance with the regex solution:

  s = "ZBBBCZZ"
  x = s.scan(/((.)\2*)/).map {|i| i[0]}

is that you capture the backref (when you don't really want to), only
to discard it in the map, which seems a bit awkard and inefficient.
However, I can't see any way around this owing to the regex semantics
of returning all fields in parens (one has the same problem in Perl
and Python, BTW). [If there were a way to specify a non-capturing
back-ref, that would do the trick.]

in 1.9 or the upcoming ruby, it keeps getting better and better and may look
like this,

s = "ZBBBCZZ"
x = s.split('').group_by{|x| x}.entries

My reading of:

http://eigenclass.org/hiki.rb?Changes+in+Ruby+1.9

indicates that Enumerable#group_by can't work because it would seem to lose
the ordering and, grouping by key, will have only one group for 'Z' above,
when I want two distinct groups. (I would be delighted to be proved wrong,
however).

I also scanned the Facets library but didn't find anything obvious.

Cheers,
/-\

···

--- Peña, Botp <botp@delmonte-phil.com> wrote:

____________________________________________________________________________________
Sick of deleting your inbox? Yahoo!7 Mail has free unlimited storage.
http://au.docs.yahoo.com/mail/unlimitedstorage.html

Maybe this ist faster:

result =
"ZBBBCZZ".scan(/((.)\2*)/){erg.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Wolfgang Nádasi-Donner

result =
"ZBBBCZZ".scan(/((.)\2*)/){result.push [$~[0]]}
p erg # => [["Z"], ["BBB"], ["C"], ["ZZ"]]

Sorry - typo by translation of variable name :frowning:

Wolfgang Nádasi-Donner

···

--
Posted via http://www.ruby-forum.com/\.

Hi --

···

On Sat, 11 Aug 2007, Peña, Botp wrote:

From: Andrew Savige [mailto:ajsavige@yahoo.com.au]
# s = "ZBBBCZZ"
# x = s.scan(/((.)\2*)/).map {|i| i[0]}

when it comes to string patterns like this, nothing beats regex

# import itertools
# s = "ZBBBCCZZ"
# x = [''.join(g) for k, g in itertools.groupby(s)]
# Does anyone know if Ruby has a similar library to Python's itertools?

hmm, you seem to like this than your previous regex+map solution, why? (i ask because i prefer your first solution --not that it's ruby)

in 1.9 or the upcoming ruby, it keeps getting better and better and may look like this,

s = "ZBBBCZZ"
x = s.split('').group_by{|x| x}.entries

or possibly to

x = s.split('').group_by.entries

I'm going to have to get special glasses that can read invisible
ink.... :slight_smile:

David

--
* Books:
   RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242\)
   RUBY FOR RAILS (http://www.manning.com/black\)
* Ruby/Rails training
     & consulting: Ruby Power and Light, LLC (http://www.rubypal.com)

Hi --

For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
That is, break the string into pieces based on change of character.

Though this works:

s = "ZBBBCZZ"
x = s.scan(/((.)\2*)/).map {|i| i[0]}

I'm new to Ruby and am interested to learn if there is a better way to
do it.

Probably not better, but just for fun, here's a way using the strscan
extension. I'd be very interested if anyone can get this to be less
clunky -- in particular, the - [""] at the end.

require 'strscan'
s = StringScanner.new("AABCCCDAAAEE")

s.string.split(//).inject() {|a,b| a << s.scan_until(/(?!#{b})/) } - [""]

=> ["AA", "B", "CCC", "D", "AAA", "EE"]

My best effort:

>> require "strscan"
=> true
>> scanner = StringScanner.new("ZBBBCZZ")
=> #<StringScanner 0/7 @ "ZBBBC...">
>> char_runs = Array.new
=>
>> char_runs << scanner.matched while scanner.scan(/(.)\1*/m)
=> nil
>> char_runs
=> ["Z", "BBB", "C", "ZZ"]

James Edward Gray II

···

On Aug 11, 2007, at 8:14 AM, dblack@rubypal.com wrote:

On Sat, 11 Aug 2007, Andrew Savige wrote:

# s = "ZBBBCZZ"
# ==>"ZBBBCZZ"
# s.scan( /((.)\2*)/ ).transpose.first
# ==>["Z", "BBB", "C", "ZZ"]
# s.gsub( /(.)(?!\1)/, "\\1\n" ).split
# ==>["Z", "BBB", "C", "ZZ"]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

···

From: William James [mailto:w_a_x_man@yahoo.com]

What's wrong with s.enum_for(:scan, /(.)\1*/).map { $& } ?

···

On 8/13/07, Brad Phelan <phelan@tttech.ttt> wrote:

Andrew Savige wrote:
> For a string "ZBBBCZZ", I want to produce a list ["Z", "BBB", "C", "ZZ"]
> That is, break the string into pieces based on change of character.
>
> Though this works:
>
> s = "ZBBBCZZ"
> x = s.scan(/((.)\2*)/).map {|i| i[0]}

Another variant which gets rid of one of the capture
groups and does introduce an artificial split character

Enumerator.new(s, :scan, /(.)\1*/).map {$&}

Note the $& will not work in the example

> x = s.scan(/((.)\2*)/).map {|i| i[0]}

because the map is run on an array after the
scan has happened. To run the map inline with
the scan you need the Enumerator object.

I doubt using Enumerator is any faster though.

Wouldn't it be nicer if scan returned an
enumerable instead of an array. We could
define

class String
    def scan_enum regexp
        Enumerator.new self, :scan, regexp
    end
end

and then be able to do

s.scan_enum(/(.)\1*/).map {$&}

--

Brad Phelan
http://xtargets.com

whoops, sorry =)
that should be

fr
   x = s.split('').group_by{|x| x}.entries.map{|x| x.join}

to
   x = s.split('').group_by.entries.map{|x| x.join}

i assume that group_by without a block would group the elements by
themselves. maybe i should name it group not group_by :slight_smile:

kind regards -botp

···

On 8/11/07, dblack@rubypal.com <dblack@rubypal.com> wrote:

> s = "ZBBBCZZ"
> x = s.split('').group_by{|x| x}.entries
>
> or possibly to
>
> x = s.split('').group_by.entries

I'm going to have to get special glasses that can read invisible
ink.... :slight_smile:

Peña schrieb:

From: William James [mailto:w_a_x_man@yahoo.com]
# s = "ZBBBCZZ"
# ==>"ZBBBCZZ"
# s.scan( /((.)\2*)/ ).transpose.first
# ==>["Z", "BBB", "C", "ZZ"]
# s.gsub( /(.)(?!\1)/, "\\1\n" ).split
# ==>["Z", "BBB", "C", "ZZ"]

ruby hacker, James, that is cool! gotta keep this.
kind regards -botp

Yeah, nice!

i think one can simplify from

s.gsub( /(.)(?!\1)/, "\\1\n" ).split

to

s.gsub(/(.)\1*/, '\0 ').split

?

cheers

Simon

# >> require "strscan" # => true
# >> scanner = StringScanner.new("ZBBBCZZ") # => #<StringScanner 0/7 @ "ZBBBC...">
# >> char_runs = Array.new # =>
# >> char_runs << scanner.matched while scanner.scan(/(.)\1*/m) # => nil
# >> char_runs # => ["Z", "BBB", "C", "ZZ"]

i just started playing w string scan after getting a hint fr dblack and reading this rubyish example fr James. i think stringscanner is an ideal solution for string scanning related problems. I noticed that stringscanner#scan returns the match, so,

s = StringScanner.new("ZBBBCZZ")

=> <StringScanner 0/7 @ "ZBBBC...">

a=

=>

a << x while x=s.scan(/(.)\1*/m)

=> nil

a

=> ["Z", "BBB", "C", "ZZ"]

again, short and readable. ruby rocks.
kind regards -botp

ps: stringscanner docs are here http://www.ruby-doc.org/stdlib/libdoc/strscan/rdoc/index.html

···

From: James Edward Gray II [mailto:james@grayproductions.net]

# > Enumerator.new(s, :scan, /(.)\1*/).map {$&}
# What's wrong with s.enum_for(:scan, /(.)\1*/).map { $& } ?

somehow i missed the enumerator hack. thank you logan and brad for the update.
kind regards -botp

···

From: Logan Capaldo [mailto:logancapaldo@gmail.com]
# On 8/13/07, Brad Phelan <phelan@tttech.ttt> wrote:

Peña schrieb:
> From: William James [mailto:w_a_x_man@yahoo.com]
> # s = "ZBBBCZZ"
> # ==>"ZBBBCZZ"
> # s.scan( /((.)\2*)/ ).transpose.first
> # ==>["Z", "BBB", "C", "ZZ"]
> # s.gsub( /(.)(?!\1)/, "\\1\n" ).split
> # ==>["Z", "BBB", "C", "ZZ"]
>
> ruby hacker, James, that is cool! gotta keep this.
> kind regards -botp

Yeah, nice!

i think one can simplify from

s.gsub( /(.)(?!\1)/, "\\1\n" ).split

to

s.gsub(/(.)\1*/, '\0 ').split

Yes it appears so. Another variation would be (this lets you use the
method on strings that contain whitespace already correctly):
require 'enumerator'
s.enum_for(:gsub, /(.)\1*/).to_a

Which is sort of back to the original scan method.

?

···

On 8/12/07, Simon Kröger <SimonKroeger@gmx.de> wrote:

cheers

Simon