I'm working on a regular expression that will chop a posted message in half,
but chop it on a new paragraph break. I've decided it should look for the
new paragraph break after 100 characters. I'd like the regular expression to
choose an earlier paragraph break rather than a later one, but at the
moment, if there is a message with a number of paragraphs, it chooses the
last possible one it can in order to make a match. I remember reading in the
Pickaxe about how regular expressions are 'greedy', and wonder if this is a
case of regex gluttony perhaps and what I can do to recommend to it a
lighter diet.
# final act is to chop message in half
if message =~ /\A(.{100,#{message.length}})<\/p>\s*<p>(.*)/m then
first_half = $1
second_half = "</p>\n<p>" + $2
else
first_half = message
end
The logic I'd like the above regex to operate with is: "Starting 100
characters into the message, chop the message at the next paragraph break".
Thanks very much, that works a treat. Always nice to have something
demonstrated in Lewis Carroll.
So, {100,#{m.length}}? effectively is now finding the first match, if any
.... Out of curiosity, is it easy to express "find the 3rd match"?. (Rather
than saying "find 3 matches").
Luke
"Gavin Kistner" <gavin@refinery.com> wrote in message
news:3844E43F-DA57-489D-9DA1-4E5FF239A296@refinery.com...
···
On Sep 4, 2005, at 5:51 PM, luke wrote:
> The logic I'd like the above regex to operate with is: "Starting 100
> characters into the message, chop the message at the next paragraph
> break".
The question mark makes quantifiers non-greedy.
+ versus .+?
* versus .*?
{a,b} versus .{a,b}?
For example:
txt = <<END
Twas brillig
and the slithy toves
did gyre and gimble
in the wabe
END
a) As noted in my example, you can leave the second 'argument' to the range quantifier empty, in which case it is unbounded.
a{3,5} <== find 3-5 'a' chars
a{3,} <== find at least 3 'a' chars, up to ... well, as many as you can
b) String#scan will take a regexp and return an array of all matches in the document. (Not as useful if you need the saved sub-expressions, however.)
···
On Sep 4, 2005, at 10:11 PM, luke wrote:
So, {100,#{m.length}}? effectively is now finding the first match, if any
.... Out of curiosity, is it easy to express "find the 3rd match"?. (Rather
than saying "find 3 matches").