email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna
Maybe I overlooked something but I didn't see anybody mention it: the
trailing (.*) seems quite superfluous to me. Why did you put it there?
That way you make the regexp engine match more than you need and if you
change sub! to gsub! at some time, you'll likely still have only one
replacement, because .* matches anything to the end.
So I could check to see if there was more content after the first paragraph that I trimmed. The code goes on to replace it with an ellipses if there was.
James Edward Gray II
···
On Sep 14, 2005, at 6:26 AM, Robert Klemme wrote:
James Edward Gray II wrote:
I keep running into some surprising points with Ruby's Regexp engine
today and this first one just looks plain wrong to me:
Maybe I overlooked something but I didn't see anybody mention it: the
trailing (.*) seems quite superfluous to me. Why did you put it
there?
So I could check to see if there was more content after the first
paragraph that I trimmed. The code goes on to replace it with an
ellipses if there was.
The method takes a chunk of HTML and pulls the first paragraph out of it (minus the <p> and </p> tags). But I want to know if there was other content, so I can add an ellipses if needed.
Here's the entire method, defined in a Rails helper module:
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
if $2 =~ /\S/
"#{html} #{link_to '...', :action => :show, :id => id}"
else
html
end
end
it never occured to me that regexes could be made to be context sensitive in
that way - that usage of the block, i think, makes them recognize more that
the regular languages doesn't it? something like
string.sub(pat){ $1 =~ /foo/ ? 'bar' : 'baz' }
though i suppose you can only look backward using this unless the pattern was
made quite general to ensure capture forward....
email :: ara [dot] t [dot] howard [at] noaa [dot] gov
phone :: 303.497.6469
Your life dwells amoung the causes of death
Like a lamp standing in a strong breeze. --Nagarjuna
The method takes a chunk of HTML and pulls the first paragraph out of
it (minus the <p> and </p> tags). But I want to know if there was
other content, so I can add an ellipses if needed.
Here's the entire method, defined in a Rails helper module:
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
if $2 =~ /\S/
"#{html} #{link_to '...', :action => :show, :id => id}"
else
html
end
end
It works as expected now.
This might be a bit more efficient (dunno how often you call it):
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
html << link_to( '...', :action => :show, :id => id ) if $2 =~
/\S/
html
end
An alternative
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>.*(\S)?\Z/m) { $1.strip }
html << link_to( '...', :action => :show, :id => id ) if $2
html
end
it never occured to me that regexes could be made to be context
sensitive in that way - that usage of the block, i think, makes them
recognize more that the regular languages doesn't it?
No. The block is just for the replacement. It doesn't change anything
for the match.
something like
string.sub(pat){ $1 =~ /foo/ ? 'bar' : 'baz' }
though i suppose you can only look backward using this unless the
pattern was made quite general to ensure capture forward....
I don't see how this is look forward or backward. The group actually has
to be matched to be able to use it as basis for some kind of conditional
replacement. There's no lookahead / lookbehing magic involved - or I
cannot see it.
That's not equivalent. You're missing a space between html's content and the ellipses.
But thanks for the ideas.
James Edward Gray II
···
On Sep 14, 2005, at 8:51 AM, Robert Klemme wrote:
James Edward Gray II wrote:
Here's the entire method, defined in a Rails helper module:
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
if $2 =~ /\S/
"#{html} #{link_to '...', :action => :show, :id => id}"
else
html
end
end
It works as expected now.
This might be a bit more efficient (dunno how often you call it):
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
html << link_to( '...', :action => :show, :id => id ) if $2 =~
/\S/
html
end
Here's the entire method, defined in a Rails helper module:
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
if $2 =~ /\S/
"#{html} #{link_to '...', :action => :show, :id => id}"
else
html
end
end
It works as expected now.
This might be a bit more efficient (dunno how often you call it):
def excerpt( textile, id )
html = sanitize(textilize(textile))
html.sub!(/<p>(.*?)<\/p>(.*)\Z/m) { $1.strip }
html << link_to( '...', :action => :show, :id => id ) if
$2 =~
/\S/
html
end
That's not equivalent. You're missing a space between html's content
and the ellipses.