Ruby regex on html file

eggie5 · 25 September 2007 23:40

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

Expected results are:

/javascripts/prototype.js,
management/javascripts/management.js,
/javascripts/scriptaculous.js,
/javascripts/effects.js,
/javascripts/controls.js

Snippet:

Ari_Brown · 26 September 2007 00:32

<sigh>
I have no shame....

For something as large and (maybe) complex as this, you might want to try generating your regexp through TextualRegexp.

gem install TextualRegexp

Good luck
ari

···

On Sep 25, 2007, at 7:40 PM, eggie5 wrote:

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

---------------------------------------------------------------|
~Ari
"I don't suffer from insanity. I enjoy every minute of it" --1337est man alive

Unbewusst_Sein2 · 26 September 2007 01:40

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Is that a solution 4 u ??? :

#! /usr/bin/env ruby

html = '

'
js =
html.each {|l|
js << l.chomp.gsub(/.* src="(.*[^ ])"[ >].*/, '\1').gsub(/(.*)"
type=.*/, '\1') if /<script / === l
}
p js

gives :
RubyMate r6354 running Ruby r1.8.6 (/opt/local/bin/ruby)

extract_js.rb

["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

on Mac OS X 10.4.10

i didn't found a solution with only one gsub...
sure it exits :[

···

eggie5 <eggie5@gmail.com> wrote:
--
Une Bévue

W_James · 26 September 2007 02:40

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Below is a snippet form my file management.rhtml, I would like to get
paths to the script files from all the script tags inside the  HTML comment tags.

Expected results are:

/javascripts/prototype.js,
management/javascripts/management.js,
/javascripts/scriptaculous.js,
/javascripts/effects.js,
/javascripts/controls.js

Snippet:

        

                                        <script type="text/javascript" src="/javascripts/prototype.js"></
>

                                        <script type="text/javascript" src="management/javascripts/
management.js"></script>

                                        <script src="/javascripts/scriptaculous.js" type="text/
javascript"></script>

                                        <script src="/javascripts/effects.js" type="text/javascript"></
>

                                        <script src="/javascripts/controls.js" type="text/javascript"></
>

puts DATA.read.scan( /<script\s+[^>]*src="(.*?)"/m ).flatten

__END__

···

On Sep 25, 6:36 pm, eggie5 <egg...@gmail.com> wrote:

Daniel_Sheppard · 26 September 2007 04:41

I'm trying to write a rake task to extract all the script tags out of
my html file and save them to an array. How can I do this?

Your subject says regex, but your request says Hpricot:

require 'hpricot'
doc = Hpricot(input)
scripts = (doc/'script').map {|x| x['src']}.compact

eggie5 · 26 September 2007 02:10

Thank you so must for your effort. This is much more succinct than
what I came up with!

File.open("app/views/layouts/management.rhtml", "r") do |infile|
          file_text=""
           while (line = infile.gets)
                file_text << line
           end

script_block=file_text.match("[\\S\\s]*?")

script_block=script_block.to_s
script_refs=script_block.scan(/[^\"]+.js/)

script_refs.length

           script_refs.each do |ref|
               base_path = "public/"
               puts "#{base_path}#{ref}"
           end
       end

···

On Sep 25, 6:35 pm, unbewusst.s...@weltanschauung.com.invalid (Une Bévue) wrote:

eggie5 <egg...@gmail.com> wrote:
> I'm trying to write a rake task to extract all the script tags out of
> my html file and save them to an array. How can I do this?

Is that a solution 4 u ??? :

#! /usr/bin/env ruby

html = ' 

                    <script type="text/javascript"
src="/javascripts/prototype.js">
        </script>

                    <script type="text/javascript"
src="management/javascripts/management.js">
</script>

                    <script src="/javascripts/scriptaculous.js"
type="text/javascript"></script>

                    <script src="/javascripts/effects.js"
type="text/javascript"></script>

                    <script src="/javascripts/controls.js"
type="text/javascript"></script>

                    
'
js =
html.each {|l|
  js << l.chomp.gsub(/.* src="(.*[^ ])"[ >].*/, '\1').gsub(/(.*)"
type=.*/, '\1') if /<script / === l}

p js

gives :
RubyMate r6354 running Ruby r1.8.6 (/opt/local/bin/ruby)

>>> extract_js.rb

["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

on Mac OS X 10.4.10

i didn't found a solution with only one gsub...
sure it exits :[
--
Une Bévue

Unbewusst_Sein2 · 26 September 2007 04:00

i don't understand your "?" here --------------^

what is his meaning after * ???

···

William James <w_a_x_man@yahoo.com> wrote:

puts DATA.read.scan( /<script\s+[^>]*src="(.*?)"/m ).flatten

--
Une Bévue

eggie5 · 26 September 2007 05:20

Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

···

On Sep 25, 9:41 pm, "Daniel Sheppard" <dani...@pronto.com.au> wrote:

> I'm trying to write a rake task to extract all the script tags out of
> my html file and save them to an array. How can I do this?

Your subject says regex, but your request says Hpricot:

require 'hpricot'
doc = Hpricot(input)
scripts = (doc/'script').map {|x| x['src']}.compact

Unbewusst_Sein2 · 26 September 2007 04:00

I found it with only one gsub :

#! /usr/bin/env ruby

html = '

'
js =
html.each {|l|
js << l.chomp.gsub(/^\s+<script\s+[^>]*src="([^ "]*).*/, '\1') if
/<script / === l
}
p js

gives :

["/javascripts/prototype.js", "management/javascripts/management.js",
"/javascripts/scriptaculous.js", "/javascripts/effects.js",
"/javascripts/controls.js"]

best,

···

eggie5 <eggie5@gmail.com> wrote:

Thank you so must for your effort. This is much more succinct than
what I came up with!

--
Une Bévue

Konrad_Meyer · 26 September 2007 04:15

Quoth Une Bévue:

···

William James <w_a_x_man@yahoo.com> wrote:

> puts DATA.read.scan( /<script\s+[^>]*src="(.*?)"/m ).flatten
i don't understand your "?" here --------------^

what is his meaning after * ???
--
Une Bévue

Non-greedy match. Find as few characters as possible to match, which in this
case means don't match quote characters.

HTH,
--
Konrad Meyer <konrad@tylerc.org> http://konrad.sobertillnoon.com/

Daniel_Sheppard · 26 September 2007 05:36

> Your subject says regex, but your request says Hpricot:
>
> require 'hpricot'
> doc = Hpricot(input)
> scripts = (doc/'script').map {|x| x['src']}.compact

Ahh, that looks beautiful right there! But will hpricot work on
a .rhtml file?

Probably - Hpricot should treat all the rhtml guff as if you're just
really really bad at writing html and treat the rhtml bits as just raw.

Hpricot('<%= <script src="monkey"> %>').at('script')['src']
=> "monkey"

The rhtml will get in the way of Hpricot seeing your tree correctly, so
finding script tags only within the head section or something like that
might not work, but for simple finds it should be fine.

Dan.

Unbewusst_Sein2 · 26 September 2007 05:55

OK, fine, thanks a lot to remaind me...

···

Konrad Meyer <konrad@tylerc.org> wrote:

Non-greedy match. Find as few characters as possible to match, which in this
case means don't match quote characters.

--
Une Bévue

Topic		Replies	Views
Can't control regular expressions ruby-talk	2	70	29 July 2008
Regexp and stack overflow ruby-talk	0	68	27 March 2006
Strip out ALL javascript from HTML source ruby-talk	0	103	2 April 2007
Source a javascript and stylesheet into rhtml file ruby-talk	4	111	18 September 2007
Extracting text from HTML ruby-talk	7	80	11 May 2003

Ruby regex on html file

Related topics