And once more question:
In Czech, c followed by h is considered (for sorting etc.) as one character/grapheme ch. I need to split string to single characters with respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
and it works fine.
Unfortunately, Ruby does not implement/support this "zero-width positive look-behind assertion", so the question is how can one efficiently split the string in Ruby?
Thanks,
P.
Pavel Smerk wrote:
And once more question:
In Czech, c followed by h is considered (for sorting etc.) as one character/grapheme ch. I need to split string to single characters with respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
and it works fine.
Unfortunately, Ruby does not implement/support this "zero-width positive look-behind assertion", so the question is how can one efficiently split the string in Ruby?
Thanks,
P.
Does this work?
irb(main):001:0> "czech".split(/([Cc][Hh])|/)
=> ["c", "z", "e", "ch"]
irb(main):002:0> "check czech".split(/([Cc][Hh])|/)
=> ["", "ch", "e", "c", "k", " ", "c", "z", "e", "ch"]
irb(main):003:0> "cHeck czeCh".split(/([Cc][Hh])|/)
=> ["", "cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
-Justin
Pavel Smerk <smerk@fi.muni.cz> writes:
And once more question:
In Czech, c followed by h is considered (for sorting etc.) as one
character/grapheme ch. I need to split string to single characters
with respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
string.split(/ch|./i)
···
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org
Or use scan:
str.scan(/(?:ch)|./i)
You might still have a problem with other characters, though,
depending on the encoding and normalisation.
Paul.
···
On 02/08/06, Justin Collins <collinsj@seattleu.edu> wrote:
irb(main):001:0> "czech".split(/([Cc][Hh])|/)
=> ["c", "z", "e", "ch"]
irb(main):002:0> "check czech".split(/([Cc][Hh])|/)
=> ["", "ch", "e", "c", "k", " ", "c", "z", "e", "ch"]
irb(main):003:0> "cHeck czeCh".split(/([Cc][Hh])|/)
=> ["", "cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
Justin Collins wrote:
Pavel Smerk wrote:
And once more question:
one more ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
In Czech, c followed by h is considered (for sorting etc.) as one character/grapheme ch. I need to split string to single characters with respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
and it works fine.
Unfortunately, Ruby does not implement/support this "zero-width positive look-behind assertion", so the question is how can one efficiently split the string in Ruby?
Stupid question.
One should not insist on word-for-word translation when rewriting some code from Perl to Ruby. ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
The solution can be e.g. scan(/[cC][hH]|./)
irb(main):001:0> "cHeck czeCh".scan(/[cC][hH]|./)
=> ["cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
Does this work?
irb(main):001:0> "czech".split(/([Cc][Hh])|/)
=> ["c", "z", "e", "ch"]
irb(main):002:0> "check czech".split(/([Cc][Hh])|/)
=> ["", "ch", "e", "c", "k", " ", "c", "z", "e", "ch"]
irb(main):003:0> "cHeck czeCh".split(/([Cc][Hh])|/)
=> ["", "cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
Scan version is slightly better as it never returns the empty string. Of course, thanks anyway.
But where can one find this feature of the split in the documentation? http://www.rubycentral.com/ref/ref_c_string.html#split does not mention split returns not only delimited substrings, but also successful groups from the match of the regexp.
Regards,
P.
Pavel Smerk wrote:
Justin Collins wrote:
Pavel Smerk wrote:
And once more question:
one more ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
In Czech, c followed by h is considered (for sorting etc.) as one character/grapheme ch. I need to split string to single characters with respect to this absurd manner.
In Perl I can write
split /(?<=(?![Cc][Hh]).)/, $string
and it works fine.
Unfortunately, Ruby does not implement/support this "zero-width positive look-behind assertion", so the question is how can one efficiently split the string in Ruby?
Stupid question.
One should not insist on word-for-word translation when rewriting some code from Perl to Ruby. ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
The solution can be e.g. scan(/[cC][hH]|./)
irb(main):001:0> "cHeck czeCh".scan(/[cC][hH]|./)
=> ["cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
Does this work?
irb(main):001:0> "czech".split(/([Cc][Hh])|/)
=> ["c", "z", "e", "ch"]
irb(main):002:0> "check czech".split(/([Cc][Hh])|/)
=> ["", "ch", "e", "c", "k", " ", "c", "z", "e", "ch"]
irb(main):003:0> "cHeck czeCh".split(/([Cc][Hh])|/)
=> ["", "cH", "e", "c", "k", " ", "c", "z", "e", "Ch"]
Scan version is slightly better as it never returns the empty string. Of course, thanks anyway.
But where can one find this feature of the split in the documentation? http://www.rubycentral.com/ref/ref_c_string.html#split does not mention split returns not only delimited substrings, but also successful groups from the match of the regexp.
Regards,
P.
As far as I can see, it's not in the documentation. I found it by accident. But, yes, the scan method is better. ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
-Justin
In Dave Thomas' Pickaxe book. Under String#split he writes:
"If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern
matches a zero-length string, str is split into individual characters. If pattern includes
groups, these groups will be included in the returned values."
Then he gives the following example:
"a@1bb@2ccc".split(/@(\d)/) => ["a", "1", "bb", "2", "ccc"]
Regards, Morton
···
On Aug 2, 2006, at 3:05 PM, Pavel Smerk wrote:
But where can one find this feature of the split in the documentation? http://www.rubycentral.com/ref/ref_c_string.html#split does not mention split returns not only delimited substrings, but also successful groups from the match of the regexp.
Yeah, there's no need for the (?: ... ). I started off thinking it was
more complicated than it was, and forgot to take that out. I really
need a regexp refactoring tool.
Paul.
···
On 02/08/06, Pavel Smerk <smerk@fi.muni.cz> wrote:
Yes, the use of scan strikes me in the meantime too. Why (?:)?
str.scan(/ch|./i) does exactly the same, doesn't it?
Oh, my gosh. If only you'd posted this little tidbit two days ago, I'd have saved a couple hours of code-wrangling.
For sorting purposes, I needed to turn something like
one-and.two@three.net
into
net.three@two.one-and
I started with str.split(/[.]|@/), but then I'd lose where the @ went. I tried turning it into
["one-and", ".", "two", "@", "three", ".", "net"]
so I could .reverse that, but without positive look-behind, I couldn't find any way to detect the break *after* the dot except with \w, which would also trigger after the hyphen.
After hours of work, I ended up with something that was not only long and confusing, involving .collect and an inner search loop and other stuff, but when I brought it back up to check it for this email message, I discovered that it didn't even actually work correctly.
And all along, all I needed to do was change
str.split(/[.]|@).reverse.join
into
str.split(/([.]|@)/).reverse.join
Dang. And thanks! ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)
···
On Aug 2, 2006, at 12:21, Justin Collins wrote:
Pavel Smerk wrote:
But where can one find this feature of the split in the documentation? http://www.rubycentral.com/ref/ref_c_string.html#split does not mention split returns not only delimited substrings, but also successful groups from the match of the regexp.
Regards,
P.
As far as I can see, it's not in the documentation. I found it by accident. But, yes, the scan method is better. ![:slight_smile: :slight_smile:](https://emoji.discourse-cdn.com/twitter/slight_smile.png?v=12)