Http-proxy in Ruby?

I'm thinking of implementing a http-proxy in Ruby that processes the
retrieved HTML before passing it on. Ideally, I'd like to rely on a
small existing framework or example code that does most of the work for
me. Does anything like that exist?

Michael

···

--
Michael Schuerig Most people would rather die than think.
mailto:michael@schuerig.de In fact, they do.
http://www.schuerig.de/michael/ --Bertrand Russell

Michael Schuerig wrote:

I'm thinking of implementing a http-proxy in Ruby that processes the
retrieved HTML before passing it on. Ideally, I'd like to rely on a
small existing framework or example code that does most of the work for
me. Does anything like that exist?

Michael

There's an httpproxy.rb as part of the webrick library. Will that work for you?

Regards,

Dan

I'm thinking of implementing a http-proxy in Ruby that processes the
retrieved HTML before passing it on. Ideally, I'd like to rely on a
small existing framework or example code that does most of the work for
me. Does anything like that exist?

I've attempted the same thing and found there is very little base to
work from. You can take a look at WEBrick's httpproxy.rb but I found
it hard to determine where I would place my "hooks" to reprocess the
content. I've got a partially functional proxy that I wrote from the
ground up, but it has issues displaying certain pages. If you're
interested I can get the code up somewhere that it can be seen.

···

On 5/10/05, Michael Schuerig <michael@schuerig.de> wrote:

Michael

--
Michael Schuerig Most people would rather die than think.
mailto:michael@schuerig.de In fact, they do.
Michael Schürig | Sentenced to making sense --Bertrand Russell

--
===Tanner Burson===
tanner.burson@gmail.com
http://tannerburson.com <---Might even work one day...

Daniel Berger wrote:

Michael Schuerig wrote:

I'm thinking of implementing a http-proxy in Ruby that processes the
retrieved HTML before passing it on. Ideally, I'd like to rely on a
small existing framework or example code that does most of the work
for me. Does anything like that exist?

Michael

There's an httpproxy.rb as part of the webrick library. Will that
work for you?

Thanks for pointing this out. It might do what I need, I'll have a
closer look.

Michael

···

--
Michael Schuerig You can twist perceptions
mailto:michael@schuerig.de Reality won't budge
Michael Schürig | Sentenced to making sense --Rush, Show Don't Tell

I've attempted the same thing and found there is very little base to
work from. You can take a look at WEBrick's httpproxy.rb but I found
it hard to determine where I would place my "hooks" to reprocess the
content. I've got a partially functional proxy that I wrote from the
ground up, but it has issues displaying certain pages. If you're
interested I can get the code up somewhere that it can be seen.

Hmm... I actually did this last week, and I found some example code on
the web pretty quickly (it was in Japanese, admittedly...). Here's
the simple AdBlock proxy I ran up whilst playing around (it uses the
pierceive adblock list). It returns an empty document for disallowed
addresses, and removes all img tags, just as an example of processing.
It's not meant to be feature-rich or even high-quality code, but it
does most of what you seem to want.

Paul.

#!/usr/bin/env ruby

require 'webrick/httpproxy'
require 'stringio'
require 'zlib'
require 'open-uri'
require 'iconv'

class AdBlocker
    def initialize
        reload
    end

    def reload
        bl =
        File.open('adblock.txt').each_line do |line|
            line.strip!
            next if (line =~ /\[Adblock\]/ || line =~ /^!/)
            if (%r!^/.*/$! =~ line)
                bl << Regexp.new(line[1..-1])
            else
                bl << line
            end
        end
        @block_list = bl
    end

    def blocked?(uri)
        @block_list.each { |rx|
            if (uri.match(rx))
                return true
            end
        }
        return false
    end
end

module WEBrick
    class RejectingProxyServer < HTTPProxyServer
        def service(req, res)
            if (@config[:ProxyURITest].call(req.unparsed_uri))
                super(req, res)
            else
                blank(req, res)
            end
        end

        def blank(req, res)
            res.header['content-type'] = 'text/plain'
            res.header.delete('content-encoding')
            res.body = ''
        end
    end
end

class ProxyServer

···

#
    # Handler that is called by the proxy to process each page
    #
    def handler(req, res)
        #p res.header
        # Inflate content if it's gzipped
        if ('gzip' == res.header['content-encoding'])
            res.header.delete('content-encoding')
            res.body = Zlib::GzipReader.new(StringIO.new(res.body)).read
        end
        res.body.gsub!(%r!<img[^>]*>!im, '[image]')
    end

    def uri_allowed(uri)
        b = @adblocker.blocked?(uri)
        #puts("--> URI #{b ? 'blocked' : 'allowed'}: #{uri}")
        return !b
    end

    def initialize
        @server = WEBrick::RejectingProxyServer.new(
            :BindAddress => '0.0.0.0',
            :Port => 8181,
            :ProxyVia => false,
        # :ProxyURI => URI.parse('http://localhost:8118/&#39;\),
            :ProxyContentHandler => method(:handler),
            :ProxyURITest => method(:uri_allowed)
        )
        @adblocker = AdBlocker.new
    end

    def start
        @server.start
    end

    def stop
        @server.shutdown
    end
end

#
# Create and start the server
#
ps = ProxyServer.new
%w[INT HUP].each { |signal| trap(signal) { ps.stop } }
ps.start

Paul Battley wrote:

I've attempted the same thing and found there is very little base to
work from. You can take a look at WEBrick's httpproxy.rb but I found
it hard to determine where I would place my "hooks" to reprocess the
content. I've got a partially functional proxy that I wrote from the
ground up, but it has issues displaying certain pages. If you're
interested I can get the code up somewhere that it can be seen.

Hmm... I actually did this last week, and I found some example code on
the web pretty quickly (it was in Japanese, admittedly...). Here's
the simple AdBlock proxy I ran up whilst playing around (it uses the
pierceive adblock list). It returns an empty document for disallowed
addresses, and removes all img tags, just as an example of processing.
It's not meant to be feature-rich or even high-quality code, but it
does most of what you seem to want.

Thanks; this is super handy.

James

Paul Battley wrote:

Hmm... I actually did this last week, and I found some example code on
the web pretty quickly (it was in Japanese, admittedly...). Here's
the simple AdBlock proxy I ran up whilst playing around (it uses the
pierceive adblock list). It returns an empty document for disallowed
addresses, and removes all img tags, just as an example of processing.
It's not meant to be feature-rich or even high-quality code, but it
does most of what you seem to want.

Thanks, that's great. My Japanese is severely lacking unfortunately.
Your code appears to be very close to what I'm intending to do. I don't
want to remove stuff from pages, rather I want to insert. Specifically,
I want to insert Greasemonkey (-> http://greasemonkey.mozdev.org/\)
scripts in the hope of using them with browsers other than
Mozilla/Firefox.

Michael

···

--
Michael Schuerig Nothing is as brilliantly adaptive
mailto:michael@schuerig.de as selective stupidity.
http://www.schuerig.de/michael/ --A.O. Rorty, The Deceptive Self