Need script: convert html-text to text

i have html-text. i have to convert this text to simple text without
html-tags.

···

--
Posted via http://www.ruby-forum.com/.

keal wrote:

i have html-text. i have to convert this text to simple text without
html-tags.

--
Posted via http://www.ruby-forum.com/.

path o'least resistance

lynx -dump www.myurl
or use links2 ## or w3m -dump www.myurl

or high-falutin solution
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/e0fb1207f1814c77/37cd5e35a1ffb8d7?q=strip+HTML+tags&rnum=7#37cd5e35a1ffb8d7

keal wrote:

i have html-text. i have to convert this text to simple text without
html-tags.

This is a very low cost variant - I guess the lynx approach is much more
effective and complete:

ruby -pe 'gsub! %r{</?.*?>}, ""' index.html

Kind regards

    robert

i have html-text. i have to convert this text to simple text without
html-tags.

It's tricky, there's more to it than you'd think. The best way is probably to use Lynx, or another browser, to do it for you, e.g.:

  def plain(url)
    `lynx -dump "#{url}"`
  end

  p = plain('http://www.google.com/')
  puts p

Outputs:

                      [1]Personalised Home | [2]Sign in

   [3]A picture of the Braille letters spelling out "Google." Happy Birthday
                               Louis Braille!

     Web [4]Images [5]Groups [6]News [7]Froogle [8]more »

... [snip] ...

Of course you'll need lynx for that to work, but you can use others too. Try a Google search.

Cheers,

···

On Wed, 04 Jan 2006 10:30:03 -0000, keal <keal21@mail.ru> wrote:

--
Ross Bamford - rosco@roscopeco.remove.co.uk