Regexp help

Newb_Newb · 21 August 2008 10:44

I Need to Extract Img tag Using Regular Expressions From The Html Page
<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
Is This Code Would be ok

Can Any One Say Me Some Other regexp For Img Tag Extracing?

···

--
Posted via http://www.ruby-forum.com/.

Lex_Williams · 21 August 2008 10:50

Newb Newb wrote:

I Need to Extract Img tag Using Regular Expressions From The Html Page
<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1
Is This Code Would be ok

Can Any One Say Me Some Other regexp For Img Tag Extracing?

Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .

···

--
Posted via http://www.ruby-forum.com/\.

Thomas_Wieczorek · 21 August 2008 10:56

Yeah, it is quite easy with Hpricot:

require 'open-uri'
require 'hpricot'

site = Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"\))
site.search("//img") #=> returns an array of all images

···

On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:

Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .

Newb_Newb · 21 August 2008 11:37

Thomas Wieczorek wrote:

Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .

Yeah, it is quite easy with Hpricot:

require 'open-uri'
require 'hpricot'

site =
Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"\))
site.search("//img") #=> returns an array of all images

yes i used as this
doc = Hpricot.parse(item.description)
imgs = doc.search("//img")
@src_array = imgs.collect{|img|img.attributes["src"]}

but it gives only the Image Url's but I need to Get
<img src =" "> tag Fully ...
Any Helps

···

On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Jan_Pilz · 21 August 2008 11:41

Newb Newb schrieb:

Thomas Wieczorek wrote:


Instead of using a regular expression you could consider a html parser ,
and/or do a xpath search to retrieve images. Check hpricot .

Yeah, it is quite easy with Hpricot:

require 'open-uri'
require 'hpricot'

site = Hpricot(open("http://code.google.com/edu/submissions/SedgewickWayne/index.html"\))
site.search("//img") #=> returns an array of all images

yes i used as this
doc = Hpricot.parse(item.description)
   imgs = doc.search("//img")
   @src_array = imgs.collect{|img|img.attributes["src"]}

but it gives only the Image Url's but I need to Get
<img src =" "> tag Fully ...
Any Helps

Then do

@src_array = imgs.collect{|img| "<img src =\"#{img.attributes["src"]
}\">" }

?

···

On Thu, Aug 21, 2008 at 12:50 PM, Lex Williams <etaern@yahoo.com> wrote:

--
Otto Software Partner GmbH

Jan Pilz (e-mail: Jan.Pilz@osp-dd.de)

Tel. 0351/49723202, Fax: 0351/49723119
01067 Dresden, Freiberger Straße 35 - AG Dresden, HRB 2475
Geschäftsführer: Burkhard Arrenberg, Heinz A. Bade, Jens Gruhl

Lex_Williams · 21 August 2008 12:00

i'm not really sure about hpricot , but with html/tree parser , when you
call a node's to_s method , you got it's full html . So , you should try
to call .to_s on the array's elements , and see if it's what you need.

···

--
Posted via http://www.ruby-forum.com/.

Newb_Newb · 22 August 2008 04:08

Jan Pilz wrote:

Newb Newb schrieb:

require 'open-uri'

doc = Hpricot.parse(item.description)
   imgs = doc.search("//img")
   @src_array = imgs.collect{|img|img.attributes["src"]}

but it gives only the Image Url's but I need to Get
<img src =" "> tag Fully ...
Any Helps


Then do

@src_array = imgs.collect{|img| "<img src =\"#{img.attributes["src"]
}\">" }

?

yes It works..
Is It Possible to Use @src_array into String.sub!(pattern,replacement)
That is

@src_array.sub(/[@src_array]/," ")

@src_array contains all the img tags.i need to replace it empty...
for that will tat above code work?
can u get me there?

···

--
Otto Software Partner GmbH

Jan Pilz (e-mail: Jan.Pilz@osp-dd.de)

Tel. 0351/49723202, Fax: 0351/49723119
01067 Dresden, Freiberger Straße 35 - AG Dresden, HRB 2475
Geschäftsführer: Burkhard Arrenberg, Heinz A. Bade, Jens Gruhl

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Img (regular expressions ruby-talk	3	91	21 August 2008
Still Query Continues ruby-talk	3	85	28 August 2008
Regular Expressions ruby-talk	1	92	27 August 2008
Regular Expressions ruby-talk	15	104	28 August 2008
Newbie: how to find & extract a string from a file ruby-talk	5	121	30 September 2006

Regexp help

Related topics