HTML filtering in weblog/BBS software

Hi all,

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.

I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.

Ditto for plain text inputs (user names, subjects and other such).

Alex

Hi all,

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.

I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.

Ditto for plain text inputs (user names, subjects and other such).

There is some work that I'm doing with Ruwiki that is currently in CVS
that covers this -- it currently covers it too well, but it does cover
it. (I just fixed this.)

    # Find HTML tags
  SIMPLE_TAG_RE = %r{<[^<>]+?>} # Ensure that only the tag is grabbed.
  HTML_TAG_RE = %r{\A< # Tag must be at start of match.
                        (/)? # Closing tag?
                        ([\w:]+) # Tag name
                        (?:\s+ # Space
                         ([^>]+) # Attributes
                         (/)? # Singleton tag?
                        )? # The above three are optional
                       >}x
  ATTRIBUTES_RE = %r{([\w:]+)(=(?:\w+|"[^"]+?"|'[^']+?'))?}x
  ALLOWED_ATTR = %w(style title type lang dir class id cite datetime abbr) +
                  %w(colspan rowspan compact start media)
  ALLOWED_HTML = %w(abbr acronym address b big blockquote br caption cite) +
                  %w(code col colgroup dd del dfn dir div dl dt em h1 h2 h3) +
                  %w(h4 h5 h6 hr i ins kbd kbd li menu ol p pre q s samp) +
                  %w(small span span strike strong style sub sup table tbody) +
                  %w(td tfoot th thead tr tt u ul var)

    # Clean the content of unsupported HTML and attributes. This includes
    # XML namespaced HTML. Sorry, but there's too much possibility for
    # abuse.
  def clean(content)
    content = content.gsub(SIMPLE_TAG_RE) do |tag|
      tagset = HTML_TAG_RE.match(tag)

      if tagset.nil?
        tag = Ruwiki.clean_entities(tag)
      else
        closer, name, attributes, single = tagset.captures

        if ALLOWED_HTML.include?(name.downcase)
          unless closer or attributes.nil?
            attributes = attributes.scan(ATTRIBUTES_RE).map do |set|
              if ALLOWED_ATTR.include?(set[0].downcase)
                set.join
              else
                ""
              end
            end.compact.join(" ")
            tag = "<#{closer}#{name} #{attributes}#{single}>"
          else
            tag = "<#{closer}#{name}>"
          end
        else
          tag = Ruwiki.clean_entities(tag)
        end
      end

      tag
    end
  end

Ruwiki.clean_entities converts all instances of & => &amp;, < => &lt;,
and > => &gt;.

-austin

···

On Thu, 14 Oct 2004 21:22:43 +0900, Alexey Verkhovsky <alex@verk.info> wrote:
--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations

Alexey Verkhovsky wrote:

Hi all,

Moin!

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

There's two options for not allowing user-specified HTML and style sheets. (Even style sheets can contain JavaScript.) Just use RedCloth like this:

RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
# => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"

BlueCloth and RDoc have similar options AFAIK.

Regards,
Florian Gross

IIRC RDoc doesn't allow raw HTML by design.

···

On Thu, Oct 14, 2004 at 11:19:47PM +0900, Florian Gross wrote:

>I am writing some sort of BBS in Ruby (on Rails). I downloaded and
>included RedCloth for template rendering (in 5 lines of code and 15
>lines of test - wow!). It's cool, but allows to include any HTML.

There's two options for not allowing user-specified HTML and style
sheets. (Even style sheets can contain JavaScript.) Just use RedCloth
like this:

RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
# => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"

BlueCloth and RDoc have similar options AFAIK.

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com