Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?
I'm wondering if there's any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it's easier to understand
what you're doing.
I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH
Kind regards
robert
···
2009/10/5 Roger Pack <rogerpack2005@gmail.com>:
Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?
I'm wondering if there's any tool out there anyone knows about that
simplifier this, like
DIGIT* (in the regex) or something so that it's easier to understand
what you're doing.
Question.
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?
Thanks.
-r
Roger,
After you play with it for awhile, it starts to follow a fairly
consistent pattern.
Usually, a capital implies a negation.. for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character..
But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.
The fact that there apparently are not many libraries for readable regular expressions in use seems to indicate that people are mostly using the native syntax of a regexp engine. Apparently it's not that hard.
Roger, I recommend "Mastering Regular Expressions" - that's a really good book on the matter and it covers the topic quite well without delving too deep into the theory of formal language.
Kind regards
robert
···
On 10/05/2009 03:27 PM, Ilan Berci wrote:
Roger Pack wrote:
Currently I am somewhat of a novice to regex's.
For example, I can't remember what \d means versus \D -- which one is a
digit, and which one isn't?
After you play with it for awhile, it starts to follow a fairly consistent pattern.
Usually, a capital implies a negation.. for example
\D <- NOT a number
\S <- NOT a space(like) character
\W <- NOT a word character..
The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means
Roger,
Do you have to remember how to talk every day? or how to add 2
numbers?.. Regexs' will just stick in your memory like everything else
and there will be no need to be worried about remembering.. just use
them and they will stick.. just like you already knew that 5 + 3 = 8
without cross referencing a table.
Remembering the name of the "help" utility however would be a pain in
the but..
But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.
The benefit being in not having to remember what \u or \g does, and not
having to remember what /(something)?/ means
I'm going to put in a plug for learning regular expression syntax
itself in a reasonably thorough way. I won't try to make a case about
what's more readable, since that clearly depends on the reader, but I
do strongly recommend that everyone take the time to become regex
literate. The programming world in general is not going to convert to
English-language-based regex wrappers (which, though some such
projects are interesting, is a mercy, because such wrappers could
easily start proliferating and competing with each other, turning the
whole thing into yet another notation soup), so the only way to
participate fully in the use of regular expressions is to be
conversant with the actual notation.
I tried to cook something up in the past. You can find it when
searching the archives for subject "Alternate Regular Expressions?".
HTH
Very interesting thread. Did anything come of it? (or florian gross' Regexp::English looks nice [2]-- Florian?)
Yes, one of my postings had a file attached which contained an implementation. I believe Ari also created a project on rubyforge. We certainly did some more polishing of the code but unfortunately I don't have the latest version handy.
Actually it's available as gem but it's definitive not the latest version that I wrote because it does not contain the optimization for multiple fixed strings in an alternative. I have to see whether I find that version somewhere.
I like this syntax (this example matches things like "-2.718 + 3.14i"):
Personally I do not like the approach with string interpolation. I'd rather extend the approach of TextualRegexp to include human readable variants of these meta sequences via method calls.
But the kicker is still how to remember that \s is white space, not \w. And we've been forced to do some tip toeing around the complexities of regex in order to make it readable.
Actually, once you have got used to them and take a bit of care they are pretty readable. For example /x goes a long way in making complex expressions readable by letting you insert whitespace and comments.
The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it's not that
hard.
Ok ok I'll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so "write only"
that they don't feel very ruby-y.
Thanks for the pointers!
-r
But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.
participate fully in the use of regular expressions is to be
conversant with the actual notation.
signed.
I really liked that site: http://www.regular-expressions.info/
helped me understand how stuff works around regular
expressions. And of course my text editor with regular
expression search and replace. As soon as you know
how it works, you won't stop using it
And by the way, I still have to
man perlre
everytime I want to use look around or similarly complex
stuff, so it's good to know where to get the answers.
Yet I'd never use something like
lookahead(...)
simply because looking it up in the man page is just about
as time consuming as typing "lookahead" all the time;
but (?=) is much more compact
The fact that there apparently are not many libraries for readable
regular expressions in use seems to indicate that people are mostly
using the native syntax of a regexp engine. Apparently it's not that
hard.
Ok ok I'll concede and suck it in and learn regular expressions. My
only misgiving is that regular expressions are typically so "write only"
that they don't feel very ruby-y.
Thanks for the pointers!
I think it's just practice, like learning musical notation or
whatever (only it's not as elaborate as musical notation). It's all
about atoms and quantifiers. Cling to that and you'll be fine
But the kicker is still how to remember that \s is white space, not \w.
And we've been forced to do some tip toeing around the complexities of
regex in order to make it readable.
Hey gang,
This is my first response. I don't know if this helps, but I use this site
quite a bit to clarify regular expressions when I am working with them.
Regards,
Eben Smith
···
On Thu, Feb 17, 2011 at 2:53 AM, Robert Klemme <shortcutter@googlemail.com>wrote:
On Thu, Feb 17, 2011 at 12:22 AM, Roger Pack <rogerpack2005@gmail.com> > wrote:
>> %r{(?<re>#{float})#{whitespace}(?<op>[+-])#{whitespace}(?<im>#{float})i}
>>
>> But the kicker is still how to remember that \s is white space, not \w.
>> And we've been forced to do some tip toeing around the complexities of
>> regex in order to make it readable.
>>
>> I had thought of a regex creator helper
>>
>> float = reg { optional(/[-+]/) + 'DIGIT+ \. DIGIT+' }
>
> I just found something related, for followers:
>
> posix characters are apparently embedded in there:
>
> [:blank:]
>
> like
>>> ' a '.scan /[[:blank:]]/
> => [" ", " "]
>>> ' a '.scan /[^[:blank:]]/
> => [" ", " "]
>
> ref: Oracle Regular Expressions
>
> Though I still wouldn't be averse to something like
> Regexp::DIGIT => '\d'
>
> So one can do "#{Regexp::DIGIT}*"
Did I mention that what Ari and I cooked up a while ago is a gem?
Errr, the project got stuck along the way as there did not seem to be
too much interest at the time. It was more of me toying around. IMHO
it should not be too hard to find out looking at the source code.