A helpful idea if you're looking for something to do

compile a whole lot of ruby regex examples, with commentary on what's
going on. the few websites I've found, and books I've looked through
just touch on the basics with minimal examples and explanation, or are
specifically for perl/etc. a nice-looking and lengthy site could be
extremely helpful to a lot of people starting with ruby, I imagine.

- dealing with unicode?
- mingling literal " / \ etc, with their regex counterparts, in ways
that would be daunting for the inexperienced
- just generally "higher-level" regex, leave the "intro to regex" to
all the other places. that's easy enough to find.

For the most part, the ruby regex engine is perl-like. And instead of having
to escape /s, we get stuff like %r@regex/bar@i. ZenSpider's Ruby QuickRef is a
great place to go for beginner help.

ยทยทยท

On Thursday 16 August 2007 09:36:13 am Simon Schuster wrote:

compile a whole lot of ruby regex examples, with commentary on what's
going on. the few websites I've found, and books I've looked through
just touch on the basics with minimal examples and explanation, or are
specifically for perl/etc. a nice-looking and lengthy site could be
extremely helpful to a lot of people starting with ruby, I imagine.

- dealing with unicode?
- mingling literal " / \ etc, with their regex counterparts, in ways
that would be daunting for the inexperienced
- just generally "higher-level" regex, leave the "intro to regex" to
all the other places. that's easy enough to find.

--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Simon Schuster wrote:

compile a whole lot of ruby regex examples, with commentary on what's
going on. the few websites I've found, and books I've looked through
just touch on the basics with minimal examples and explanation, or are
specifically for perl/etc. a nice-looking and lengthy site could be
extremely helpful to a lot of people starting with ruby, I imagine.

- dealing with unicode?

This one bothered me a lot, but the solution is simple. At the beginning of the document, set
$KCODE = "u"

This will fix regex behavior for use with regular expressions. I assume the default behavior will be improved with Ruby 2.0, but I'm not using 1.9 so can't say for sure.

- mingling literal " / \ etc, with their regex counterparts, in ways
that would be daunting for the inexperienced

The first think to keep in mind is that it never hurts to accidentally escape something in a double quoted (soft quoted) string or regex. So if you aren't sure, "\"", "\'", "\\" are all okay, as are /\"/, /\//, and %r|\/| (the latter being an alternative way to specify a regex. But you only need to escape characters that have special meaning. So in a slash-delimited regex, a slash has special meaning, but in a %r regex, it does not:
%{/} is the same as /\//, as the former does not need to be escaped.

If you use Regexp.new(" ... "), then the regexp comes from a string, and needs to follow the escaping rules for strings--you need to escape double quotes.

A single quoted string is sometimes called "hard quoted". This means nothing is expanded / nothing has special meaning, so nothing needs to be escaped. Slash is not an escape character, here. The one exception is if the slash is before a single quote, in which case it will escape it.

Sorry if these rules are confusing. You will get used to them. The way to learn regular expressions is to use them. You will get comfortable with them when you need them.

- just generally "higher-level" regex, leave the "intro to regex" to
all the other places. that's easy enough to find.

Here's one of mine:
/<a[^>]+?href=['"]?(.+?)['"\s>][^>]*>/im
This matches a link. Throughout the regex I use [^>] frequently, which means "any character that does not end the tag". Think of [^>]* as a better .*
Interesting bits:
-using +? says that the match is non-greedy. It will match as little as possible. *? does the same think, but I find less use for it, as it usually matches an empty string.
-the /i and /m at the end mean "case insensitive" and "multi-line". You can mix and match from /i, /m, /x (extended--ignores whitespace in the regex).

I don't know what your level is, so this may be a bit too cryptic, but you can probably puzzle it out if you are complaining about regex tutorials being too basic.

Dan