Regexp with Ruby

Ajay_Vijey · 15 November 2006 17:56

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:

···

-------------
<td bordercolor="#FFFFFF">
<table border="0" id="table2" bgcolor="#FFFFFF" width="100%">
<tr>
<td align="left" valign="top" width="25%"><a href="../personal/po.htm">
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

Will scan this image tag:
--------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84" height="64">

I already tested this with the follow code:
-----------------------------------------------
...scan(/<img.*>/m)
and with
...scan(/<img.*?>/m)

But the result was always:
----------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

I hope someone can help me! Thanks a lot!

Kind Regards
Ajay

--
Posted via http://www.ruby-forum.com/.

Hugh_Sasse · 15 November 2006 18:03

Hallo @ all,

I have to replace in a File the image tags with an other!

File Data:
-------------

[trimmed]

Will scan this image tag:
--------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84" height="64">

I already tested this with the follow code:
-----------------------------------------------
...scan(/<img.*>/m)
and with
...scan(/<img.*?>/m)

But the result was always:
----------------------------
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>
<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm">

I hope someone can help me! Thanks a lot!

I'd agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

Kind Regards
Ajay

Hugh

···

On Thu, 16 Nov 2006, Ajay Vijey wrote:

Paul_Lutus · 15 November 2006 18:40

Ajay Vijey wrote:

Hallo @ all,

I have to replace in a File the image tags with an other!

As another poster has pointed out, you aren't showing enough code for an
analysis, and, while you are replacing tags, please reformat your IMG tags
thus:

Note the self-closing form. This won't bother older browsers, and it will
allow you to meet the newer (X)HTML standards as well.

Here is sample program that extracts all the IMG tags from a Web page (of
both the old and new varieties):

···

----------------------------------------
#!/usr/bin/ruby -w

data = File.read("sample.html")

extract = data.scan(%r{<img.*?/>}m)

puts extract.join("\n")
----------------------------------------

This outputs from my sample page:

--
Paul Lutus
http://www.arachnoid.com

Ajay_Vijey · 15 November 2006 18:20

Hugh Sasse wrote:

I'd agree with your choice of regexp. I think we need to see more of
the surrounding code to fix this.

rubyscript

···

--------------
datei_new = IO.read(“index.htm”)
datei_regexp = datei_new.scan(/(<img.*>)/m)

puts datei_regexp

index.htm
------------

</tr>
</table>

</td>
</tr>
</table>
</body>
</html>

--
Posted via http://www.ruby-forum.com/\.

Hemant_Kumar · 15 November 2006 19:07

If i were to do this..I would use hpricot.

···

On 11/16/06, Paul Lutus <nospam@nosite.zzz> wrote:

Ajay Vijey wrote:

> Hallo @ all,
>
> I have to replace in a File the image tags with an other!

As another poster has pointed out, you aren't showing enough code for an
analysis, and, while you are replacing tags, please reformat your IMG tags
thus:

<img src="..."/>

Note the self-closing form. This won't bother older browsers, and it will
allow you to meet the newer (X)HTML standards as well.

Here is sample program that extracts all the IMG tags from a Web page (of
both the old and new varieties):

----------------------------------------
#!/usr/bin/ruby -w

data = File.read("sample.html")

extract = data.scan(%r{<img.*?/>}m)

puts extract.join("\n")
----------------------------------------

This outputs from my sample page:

<img src="../images/leftarrow.png" border="0" alt="" />
<img src="../images/rightarrow.png" border="0" alt="" />
<img src="rock_ptarmigan_chick_small.jpg" width="300" height="289" alt=""/>
<img src="pws_naked_island003_cropped_small.jpg" width="300" height="232"
alt=""/>
<img src="pws_naked_island011_cropped_small.jpg" width="300" height="225"
alt=""/>
<img src="pws_naked_island007_cropped_small.jpg" width="300" height="236"
alt=""/>
<img src="pws_naked_island012_small.jpg" width="300" height="200" alt=""/>
<img src="pws_naked_island013_cropped_small.jpg" width="300" height="236"
alt=""/>
<img src="../images/leftarrow.png" border="0" alt="" />
<img src="../images/rightarrow.png" border="0" alt="" />

--
Paul Lutus
http://www.arachnoid.com

--
There was only one Road; that it was like a great river: its springs
were at every doorstep, and every path was its tributary.

Ken_Bloom · 15 November 2006 20:35

Ajay Vijey wrote:

Hugh Sasse wrote:
> I'd agree with your choice of regexp. I think we need to see more of
> the surrounding code to fix this.

rubyscript
--------------
datei_new = IO.read("index.htm")
datei_regexp = datei_new.scan(/(<img.*>)/m)

puts datei_regexp

index.htm
------------

<html>
<head><title>test</title></head>
<body>
<table>
<tr>
<td bordercolor="#FFFFFF">

<table border="0" id="table2" bgcolor="#FFFFFF" width="100%">
<tr>

<td align="left" valign="top" width="25%"><a href="../personal/po.htm">
<img src="images/animationpundo_schwarz_kl.gif"
alt="Personal und Organisation" border="0" width="84"
height="64"></a></td>

<td align="left" valign="top" width="25%"><font face="Arial"><a
href="../personal/po.htm"></font></td>

</tr>
</table>

</td>
</tr>
</table>
</body>
</html>

Works for me with datei_new.scan(/(<img.*?>)/m) (the .*? performs a
non-greedy match so it stops with the smallest match it can make,
rather than the longest)

The parentheses you have around the text of the regexp are unnecessary,
they cause the results to be more deeply nested in arrays. You should
use /<img.*?>/m

--Ken Bloom

Topic		Replies	Views
Img (regular expressions ruby-talk	3	93	21 August 2008
Regexp help ruby-talk	6	102	22 August 2008
Gsub HELP ruby-talk	4	104	23 August 2008
Regular expression help ruby-talk	2	88	23 August 2008
Confusion trying to get IMG tags from html page ruby-talk	7	122	30 July 2005

Regexp with Ruby

Related topics