rpheath wrote:
Thanks for the reply. I'm relatively new to regular expressions, and
misinterpretted the ^s and $s. I was thinking they were for that
specific check, so it was either the first string "|" (or) the second
string.
Here's sample text that would be passed into it.
-----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>
<pre>
def strip_blocks(text)
text.gsub([regex],'')
end
</pre>
<p>This is another sentence before the block quote.</p>
<blockquote>
<p>This is a quote</p>
</blockquote>
<p>This is one more sentence</p>
----------------------
What I would like to have left is this:
----------------------
<p>This is the first sentence. Now I'll post a code snippet:</p>
<p>This is another sentence before the block quote.</p>
<p>This is one more sentence</p>
----------------------
Hopefully that helps. Sorry the question is not organized and kind of
basic, but I'm new to this. Thanks again for any help.
Try this. It uses the "non-greedy" operator '?' and multiline
case-insensitive matching. Not using the 'non-greedy' operator would
gobble up everything between two tags, including nested tags of the
same name. This is probably not what you would want.
def remove_tag_block(tag, text)
text.gsub(/<#{tag}>.*?<\/#{tag}>/im, '')
end
irb(main):054:0> text
=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n<pre>\ndef strip_blocks(text)\n
text.gsub([regex],'')\nend\n</pre>\n\n<p>This is another sentence before
the block quote.</p>\n\n<blockquote>\n <p>This is a
quote</p>\n</blockquote>\n\n<p>This is one more sentence</p>"
irb(main):055:0> t=remove_tag_block("pre", text)
=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n\n\n<p>This is another sentence before the block
quote.</p>\n\n<blockquote>\n <p>This is a
quote</p>\n</blockquote>\n\n<p>This is one more sentence</p>"
irb(main):056:0> remove_tag_block("blockquote", t)
=> "<p>This is the first sentence. Now I'll post a code
snippet:</p>\n\n\n\n<p>This is another sentence before the block
quote.</p>\n\n\n\n<p>This is one more sentence</p>"
The problem is that this won't work with nested tags, e.g.
<table><tr><td><table>stuff</table></td></tr></table>
irb(main):065:0>
x="<table><tr><td><table>stuff</table></td></tr></table>"
=> "<table><tr><td><table>stuff</table></td></tr></table>"
irb(main):066:0> remove_tag_block("table", x)
=> "</td></tr></table>"
This is because *regular* regular expressions
can't match nested
pairs, such as "((()(())()))". I think I read somewhere a phrase that
regexp's can't count. You have to use *recursive* regular expressions,
which are found in PCRE (Perl RE), but AFAIK not in the current Ruby
regexp engine. Maybe Oniguruma has it - I dunno. I saw a PCRE extension
for Ruby somewhere, but I don't know anything about it.
The Perl RE for matching nested parentheses is apparently as follows
(from
The Joy of Regular Expressions [1] — SitePoint)
\(((?>[^()]+)|(?R))*\)
I believe that to do this correctly without PCRE, you have to resort to
some text parsing or use a SAX parser or similar. Maybe some Ruby guru
(i.e. not me) will be able to pull out an RE or some easy way to do
this.
···
--
Posted via http://www.ruby-forum.com/\.