Regex simplifier?

Roger_Pack4 · 5 October 2009 12:28

Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?

I'm wondering if there's any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it's easier to understand
what you're doing.

Thanks.
-r

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 5 October 2009 13:08

I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH

Kind regards

robert

···

2009/10/5 Roger Pack <rogerpack2005@gmail.com>:

Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?

I'm wondering if there's any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it's easier to understand
what you're doing.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ilan_Berci1 · 5 October 2009 13:27

Roger Pack wrote:

Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?

Thanks.
-r

Roger,

After you play with it for awhile, it starts to follow a fairly
consistent pattern.

Usually, a capital implies a negation.. for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character..

etc..

···

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 5 October 2009 16:49

I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH

Very interesting thread. Did anything come of it? (or florian gross'
Regexp::English looks nice [2]-- Florian?)

I like this syntax (this example matches things like "-2.718 + 3.14i"):

PAT.float['re'] + REP0.whitespace + ALT("+", "-")['op'] +
REPO.whitespace + PAT.float['im'] + 'i' [1]

I'm not sure how to do nested matches or optionals or what not however.

I suppose that's the equivalent of (in 1.9)
float = /[-+]?\d+\.\d+/
whitespace = /\s+/

%r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

Then you can mix it into 1.9 style regex's the same way.

Thoughts?
-r

[1]

[2]
http://markmail.org/message/rzudqptkuls7dncy?q=Regexp::English+gross&page=1&refer=cuj6ru2rprrvh2sm

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 5 October 2009 20:25

The fact that there apparently are not many libraries for readable regular expressions in use seems to indicate that people are mostly using the native syntax of a regexp engine. Apparently it's not that hard.

Roger, I recommend "Mastering Regular Expressions" - that's a really good book on the matter and it covers the topic quite well without delving too deep into the theory of formal language.

Kind regards

robert

···

On 10/05/2009 03:27 PM, Ilan Berci wrote:

Roger Pack wrote:

Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?

After you play with it for awhile, it starts to follow a fairly consistent pattern.

Usually, a capital implies a negation.. for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character..

etc..

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Ilan_Berci1 · 5 October 2009 17:05

Roger Pack wrote:

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

Roger,

Do you have to remember how to talk every day? or how to add 2
numbers?.. Regexs' will just stick in your memory like everything else
and there will be no need to be worried about remembering.. just use
them and they will stick.. just like you already knew that 5 + 3 = 8
without cross referencing a table.

Remembering the name of the "help" utility however would be a pain in
the but..

ilan

···

--
Posted via http://www.ruby-forum.com/\.

David_A_Black1 · 5 October 2009 17:21

Hi --

I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH

Very interesting thread. Did anything come of it? (or florian gross'
Regexp::English looks nice [2]-- Florian?)

I like this syntax (this example matches things like "-2.718 + 3.14i"):

PAT.float['re'] + REP0.whitespace + ALT("+", "-")['op'] +
REPO.whitespace + PAT.float['im'] + 'i' [1]

I'm not sure how to do nested matches or optionals or what not however.

I suppose that's the equivalent of (in 1.9)
float = /[-+]?\d+\.\d+/
whitespace = /\s+/

%r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }

The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means

I'm going to put in a plug for learning regular expression syntax
itself in a reasonably thorough way. I won't try to make a case about
what's more readable, since that clearly depends on the reader, but I
do strongly recommend that everyone take the time to become regex
literate. The programming world in general is not going to convert to
English-language-based regex wrappers (which, though some such
projects are interesting, is a mercy, because such wrappers could
easily start proliferating and competing with each other, turning the
whole thing into yet another notation soup), so the only way to
participate fully in the use of regular expressions is to be
conversant with the actual notation.

David

···

On Tue, 6 Oct 2009, Roger Pack wrote:

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Robert_K1 · 5 October 2009 20:15

I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH

Very interesting thread. Did anything come of it? (or florian gross' Regexp::English looks nice [2]-- Florian?)

Yes, one of my postings had a file attached which contained an implementation. I believe Ari also created a project on rubyforge. We certainly did some more polishing of the code but unfortunately I don't have the latest version handy.

Actually it's available as gem but it's definitive not the latest version that I wrote because it does not contain the optimization for multiple fixed strings in an alternative. I have to see whether I find that version somewhere.

I like this syntax (this example matches things like "-2.718 + 3.14i"):

PAT.float['re'] + REP0.whitespace + ALT("+", "-")['op'] + REPO.whitespace + PAT.float['im'] + 'i' [1]

I'm not sure how to do nested matches or optionals or what not however.

I suppose that's the equivalent of (in 1.9)
float = /[-+]?\d+\.\d+/
whitespace = /\s+/

%r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}

Personally I do not like the approach with string interpolation. I'd rather extend the approach of TextualRegexp to include human readable variants of these meta sequences via method calls.

But the kicker is still how to remember that \s is white space, not \w. And we've been forced to do some tip toeing around the complexities of regex in order to make it readable.

Actually, once you have got used to them and take a bit of care they are pretty readable. For example /x goes a long way in making complex expressions readable by letting you insert whitespace and comments.

Kind regards

robert

···

On 10/05/2009 06:49 PM, Roger Pack wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Roger_Pack4 · 6 October 2009 11:23

The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it's not that
hard.

Ok ok I'll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so "write only"
that they don't feel very ruby-y.
Thanks for the pointers!
-r

···

--
Posted via http://www.ruby-forum.com/\.

Roger_Pack4 · 16 February 2011 23:22

%r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }

I just found something related, for followers:

posix characters are apparently embedded in there:

[:blank:]

like

' a '.scan /[[:blank:]]/

=> [" ", " "]

' a '.scan /[^[:blank:]]/

=> [" ", " "]

ref: Oracle Regular Expressions

Though I still wouldn't be averse to something like
Regexp::DIGIT => '\d'

So one can do "#{Regexp::DIGIT}*"

···

--
Posted via http://www.ruby-forum.com/\.

Fabian_Streitel1 · 5 October 2009 18:29

[..]

so the only way to

participate fully in the use of regular expressions is to be

conversant with the actual notation.

signed.

I really liked that site: http://www.regular-expressions.info/
helped me understand how stuff works around regular
expressions. And of course my text editor with regular
expression search and replace. As soon as you know
how it works, you won't stop using it

And by the way, I still have to
man perlre
everytime I want to use look around or similarly complex
stuff, so it's good to know where to get the answers.
Yet I'd never use something like
lookahead(...)
simply because looking it up in the man page is just about
as time consuming as typing "lookahead" all the time;
but (?=) is much more compact

Greetz!

David_A_Black1 · 6 October 2009 12:51

Hi --

···

On Tue, 6 Oct 2009, Roger Pack wrote:

The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it's not that
hard.

Ok ok I'll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so "write only"
that they don't feel very ruby-y.
Thanks for the pointers!

I think it's just practice, like learning musical notation or
whatever (only it's not as elaborate as musical notation). It's all
about atoms and quantifiers. Cling to that and you'll be fine

David

--
David A. Black, Director
Ruby Power and Light, LLC (http://www.rubypal.com)
Ruby/Rails training, consulting, mentoring, code review
Book: The Well-Grounded Rubyist (http://www.manning.com/black2\)

Robert_K1 · 17 February 2011 09:53

Did I mention that what Ari and I cooked up a while ago is a gem?

http://rubygems.org/gems/TextualRegexp

Cheers

robert

···

On Thu, Feb 17, 2011 at 12:22 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

%r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}

But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.

I had thought of a regex creator helper

float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }

I just found something related, for followers:

posix characters are apparently embedded in there:

[:blank:]

like

' a '.scan /[[:blank:]]/

=> [" ", " "]

' a '.scan /[^[:blank:]]/

=> [" ", " "]

ref: Oracle Regular Expressions

Though I still wouldn't be averse to something like
Regexp::DIGIT => '\d'

So one can do "#{Regexp::DIGIT}*"

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

eben_smith · 17 February 2011 09:59

Hey gang,
This is my first response. I don't know if this helps, but I use this site
quite a bit to clarify regular expressions when I am working with them.

Regards,
Eben Smith

···

On Thu, Feb 17, 2011 at 2:53 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Thu, Feb 17, 2011 at 12:22 AM, Roger Pack <rogerpack2005@gmail.com> > wrote:
>> %r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}
>>
>> But the kicker is still how to remember that \s is white space, not \w.
>> And we've been forced to do some tip toeing around the complexities of
>> regex in order to make it readable.
>>
>> I had thought of a regex creator helper
>>
>> float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }
>
> I just found something related, for followers:
>
> posix characters are apparently embedded in there:
>
> [:blank:]
>
> like
>>> ' a '.scan /[[:blank:]]/
> => [" ", " "]
>>> ' a '.scan /[^[:blank:]]/
> => [" ", " "]
>
> ref: Oracle Regular Expressions
>
> Though I still wouldn't be averse to something like
> Regexp::DIGIT => '\d'
>
> So one can do "#{Regexp::DIGIT}*"

Did I mention that what Ari and I cooked up a while ago is a gem?

TextualRegexp | RubyGems.org | your community gem host

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Roger_Pack4 · 18 February 2011 00:25

So one can do "#{Regexp::DIGIT}*"

Did I mention that what Ari and I cooked up a while ago is a gem?

TextualRegexp | RubyGems.org | your community gem host

Cool, but the default rdoc's don't explain its use easily...perhaps
there is a url I could refer to?
-r

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 18 February 2011 08:12

Errr, the project got stuck along the way as there did not seem to be
too much interest at the time. It was more of me toying around. IMHO
it should not be too hard to find out looking at the source code.

There's a mini example
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/263918

If I find the time I might come up with more thorough documentation.

Cheers

robert

···

On Fri, Feb 18, 2011 at 1:25 AM, Roger Pack <rogerpack2005@gmail.com> wrote:

So one can do "#{Regexp::DIGIT}*"

Did I mention that what Ari and I cooked up a while ago is a gem?

TextualRegexp | RubyGems.org | your community gem host

Cool, but the default rdoc's don't explain its use easily...perhaps
there is a url I could refer to?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Topic		Replies	Views
Alternate Regular Expressions? ruby-talk	26	217	24 December 2009
Regular expressions ruby-talk	26	219	17 April 2003
Specification of Ruby regex? ruby-talk	31	244	28 August 2003
How would you design regexps in the integer domain? ruby-talk	12	177	6 May 2008
ANN: Regexador - A mini-language for regular expressions ruby-talk	12	255	28 September 2013

Regex simplifier?

Related topics