Regex

Tom_Allison · 16 January 2006 13:29

I'm digging through the Pragmatic Programmers book and had some questions about regex support.

Initially they only mention .sub and .gsub.

If I wanted to write a regex block using additional flags, say something like this:
s/perl/ruby/igsm

What that be expressed as:

"my favorite programming language is perl".gsub(/perl/ism, 'ruby')

???
What about the 'x' option?

David_A_Black3 · 16 January 2006 14:06

Hi --

I'm digging through the Pragmatic Programmers book and had some questions about regex support.

Initially they only mention .sub and .gsub.

If I wanted to write a regex block using additional flags, say something like this:
s/perl/ruby/igsm

What that be expressed as:

"my favorite programming language is perl".gsub(/perl/ism, 'ruby')

???

/s turns on SJIS encoding, so you probably don't want that. As for
all the multiline matching, etc., /m causes the wildcard dot to match
newline characters, like /s in Perl. You don't need an equivalent of
Perl's /m because ^ and $ already always match beginning/end of lines.
To match beginning/end of string, you use anchors: \A and \z (or \Z to
discount a final newline).

What about the 'x' option?

What about it? It's there if you want to use it.

David

···

On Mon, 16 Jan 2006, Tom Allison wrote:

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!

Tom_Allison · 16 January 2006 15:01

dblack@wobblini.net wrote:

/s turns on SJIS encoding, so you probably don't want that. As for
all the multiline matching, etc., /m causes the wildcard dot to match
newline characters, like /s in Perl. You don't need an equivalent of
Perl's /m because ^ and $ already always match beginning/end of lines.
To match beginning/end of string, you use anchors: \A and \z (or \Z to
discount a final newline).

There's a lot of differences to get used to here.
I'm surprised that Ruby would go contrary to Perl in the regex flags. It's bound to be a source of confusion, at least for myself.

David_A_Black3 · 16 January 2006 15:11

Hi --

···

On Tue, 17 Jan 2006, Tom Allison wrote:

dblack@wobblini.net wrote:

/s turns on SJIS encoding, so you probably don't want that. As for
all the multiline matching, etc., /m causes the wildcard dot to match
newline characters, like /s in Perl. You don't need an equivalent of
Perl's /m because ^ and $ already always match beginning/end of lines.
To match beginning/end of string, you use anchors: \A and \z (or \Z to
discount a final newline).

There's a lot of differences to get used to here.
I'm surprised that Ruby would go contrary to Perl in the regex flags. It's bound to be a source of confusion, at least for myself.

It's like so many things: if languages just did what older languages
did, there would be no point in having new languages

David

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!

David_Vallner · 16 January 2006 20:09

AND here's a completely new regexp engine coming for 2.0 - fun for the whole family.

That said, although it's undesirable, noone's obliged to be completely PCRE-compatible, and it always pays to take five minutes to read the (IMO pretty clear as far as regexps are concerned) documentation.

···

On Mon, 16 Jan 2006 16:01:48 +0100, Tom Allison <tallison@tacocat.net> wrote:

dblack@wobblini.net wrote:

/s turns on SJIS encoding, so you probably don't want that. As for
all the multiline matching, etc., /m causes the wildcard dot to match
newline characters, like /s in Perl. You don't need an equivalent of
Perl's /m because ^ and $ already always match beginning/end of lines.
To match beginning/end of string, you use anchors: \A and \z (or \Z to
discount a final newline).

There's a lot of differences to get used to here.
I'm surprised that Ruby would go contrary to Perl in the regex flags. It's bound to be a source of confusion, at least for myself.

Tom_Allison · 16 January 2006 21:29

AND here's a completely new regexp engine coming for 2.0 - fun for the whole family.

That said, although it's undesirable, noone's obliged to be completely PCRE-compatible, and it always pays to take five minutes to read the (IMO pretty clear as far as regexps are concerned) documentation.

Considering that PCRE is pretty much a language in itself, it would make life a hell of a lot easier if there was at least some effort to either stick to the established norms of regex engines or at least avoid treading on them in the name of being different. regexp isn't trivial, making it more varied from language to language isn't going to help matters much.

David_Vallner · 17 January 2006 02:37

Well, I think Oniguruma, the new regexp engine, is a bit closer to PCRE in feature availability. And the changes to regexp options don't seem that catastrophical to me, I'd rather express my intention in the pattern itself than changing its behaviour globally or using option grouping inside it.

David Vallner

···

On Mon, 16 Jan 2006 22:29:45 +0100, Tom Allison <tallison@tacocat.net> wrote:

AND here's a completely new regexp engine coming for 2.0 - fun for the whole family.
That said, although it's undesirable, noone's obliged to be completely PCRE-compatible, and it always pays to take five minutes to read the (IMO pretty clear as far as regexps are concerned) documentation.

Considering that PCRE is pretty much a language in itself, it would make life a hell of a lot easier if there was at least some effort to either stick to the established norms of regex engines or at least avoid treading on them in the name of being different. regexp isn't trivial, making it more varied from language to language isn't going to help matters much.

David_A_Black3 · 17 January 2006 02:42

Hi --

AND here's a completely new regexp engine coming for 2.0 - fun for the whole family.

That said, although it's undesirable, noone's obliged to be completely PCRE-compatible, and it always pays to take five minutes to read the (IMO pretty clear as far as regexps are concerned) documentation.

Considering that PCRE is pretty much a language in itself, it would make life a hell of a lot easier if there was at least some effort to either stick to the established norms of regex engines or at least avoid treading on them in the name of being different. regexp isn't trivial, making it more varied from language to language isn't going to help matters much.

Then you'll have to blame Perl, for breaking away from sed
There's really never been one regex syntax, across all these various
languages and utilities; it's a very, very unlikely area for
uniformity. In any case, Perl didn't necessarily get everything right
-- for example, it's nice to have anchors for all possible
combinations of start/end and string/line, as Ruby does. (Maybe
recent Perls do too; I'm not sure.)

David

···

On Tue, 17 Jan 2006, Tom Allison wrote:

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!

Topic		Replies	Views
Regex flags ruby-talk	7	77	11 April 2006
Regexp and $ ruby-talk	8	76	28 April 2003
Ruby global regex question ruby-talk	6	108	19 November 2008
Regex help please please! ruby-talk	3	75	20 December 2002
Regular expression question ruby-talk	6	86	5 July 2004

Regex

Related topics