i have html-text. i have to convert this text to simple text without
html-tags.
···
--
Posted via http://www.ruby-forum.com/.
i have html-text. i have to convert this text to simple text without
html-tags.
--
Posted via http://www.ruby-forum.com/.
keal wrote:
i have html-text. i have to convert this text to simple text without
html-tags.--
Posted via http://www.ruby-forum.com/.
path o'least resistance
lynx -dump www.myurl
or use links2 ## or w3m -dump www.myurl
or high-falutin solution
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/e0fb1207f1814c77/37cd5e35a1ffb8d7?q=strip+HTML+tags&rnum=7#37cd5e35a1ffb8d7
keal wrote:
i have html-text. i have to convert this text to simple text without
html-tags.
This is a very low cost variant - I guess the lynx approach is much more
effective and complete:
ruby -pe 'gsub! %r{</?.*?>}, ""' index.html
Kind regards
robert
i have html-text. i have to convert this text to simple text without
html-tags.
It's tricky, there's more to it than you'd think. The best way is probably to use Lynx, or another browser, to do it for you, e.g.:
def plain(url)
`lynx -dump "#{url}"`
end
p = plain('http://www.google.com/')
puts p
Outputs:
[1]Personalised Home | [2]Sign in
[3]A picture of the Braille letters spelling out "Google." Happy Birthday
Louis Braille!
Web [4]Images [5]Groups [6]News [7]Froogle [8]more »
... [snip] ...
Of course you'll need lynx for that to work, but you can use others too. Try a Google search.
Cheers,
On Wed, 04 Jan 2006 10:30:03 -0000, keal <keal21@mail.ru> wrote:
--
Ross Bamford - rosco@roscopeco.remove.co.uk