Hi,
I know that what i'm going to ask is for the solution for a simple
problem. But as I'm new to Ruby I have not learnt a lot about regular
expressions in Ruby.
Can anybody tell me how to extract all the contents which are included
inside the '<html>' and '</html>' tag and also to extract the text given
in between the '<a>' and '</a>' tag using regular expression. I know it
can be extracted using the 'scan' method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me
Hi,
I know that what i'm going to ask is for the solution for a simple
problem. But as I'm new to Ruby I have not learnt a lot about regular
expressions in Ruby.
Can anybody tell me how to extract all the contents which are included
inside the '<html>' and '</html>' tag and also to extract the text given
in between the '<a>' and '</a>' tag using regular expression. I know it
can be extracted using the 'scan' method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me
Hi,
I know that what i'm going to ask is for the solution for a simple
problem. But as I'm new to Ruby I have not learnt a lot about regular
expressions in Ruby.
Can anybody tell me how to extract all the contents which are included
inside the '<html>' and '</html>' tag and also to extract the text given
in between the '<a>' and '</a>' tag using regular expression. I know it
can be extracted using the 'scan' method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me
Regards
Arun
s = "<a>hello world</a>"
new_s = s.gsub(/<.*?>/, "")
puts new_s
--output:--
hello world
html = DATA.read()
regex = Regexp.new("<html>(.*)</html>", Regexp::MULTILINE)
puts html[regex, 1]
Can anybody tell me how to extract all the contents which are included
inside the '<html>' and '</html>' tag and also to extract the text given
in between the '<a>' and '</a>' tag using regular expression. I know it
can be extracted using the 'scan' method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me
/<html>(.*)<\/html>/m
This regexp will capture anything between an opening html tag and a closing
one. the /m option specifies "Multiline Mode: "." will match any character
including a newline.
For our content, it will capture:
<body>
<p>
Want a Ruby regular expression editor? Check out <a href=" http://www.rubular.com/">Rubular</a>\.
</p>
</body>
/<a.*>(.*)<\/a>/
This regexp will capture the text between an opening anchor element and a
closing one. The first ".*" is there to deal with href and any other
attribute. You might wanna throw the /m option in there too.
For our content, it will capture:
Rubular
···
On Mon, Mar 23, 2009 at 9:49 AM, Arun Kumar <arunkumar@innovaturelabs.com>wrote:
On Mon, Mar 23, 2009 at 11:18 AM, Arun Kumar <arunkumar@innovaturelabs.com> wrote:
I know that using mechanize or hpricot is a far better option in this
case. But i'm just asking as a matter of curiosity to know about regexps
Dare I say, a man should use regexps if only to satisfy his curiosity.
On Mon, Mar 23, 2009 at 12:19 PM, Arun Kumar <arunkumar@innovaturelabs.com>wrote:
Hi,
I know that what i'm going to ask is for the solution for a simple
problem. But as I'm new to Ruby I have not learnt a lot about regular
expressions in Ruby.
Can anybody tell me how to extract all the contents which are included
inside the '<html>' and '</html>' tag and also to extract the text given
in between the '<a>' and '</a>' tag using regular expression. I know it
can be extracted using the 'scan' method but I dont know what should be
the matching patterns or expressions. Can anybody pls help me
...oh, yeah. Normally, a . matches any character except a newline. The
regex .* matches any character 0 or more times--but to get it to match
newlines as well, you have to specify Regxp::MULTILINE.