Hi all,
I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.
Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.
I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.
Ditto for plain text inputs (user names, subjects and other such).
There is some work that I'm doing with Ruwiki that is currently in CVS
that covers this -- it currently covers it too well, but it does cover
it. (I just fixed this.)
# Find HTML tags
SIMPLE_TAG_RE = %r{<[^<>]+?>} # Ensure that only the tag is grabbed.
HTML_TAG_RE = %r{\A< # Tag must be at start of match.
(/)? # Closing tag?
([\w:]+) # Tag name
(?:\s+ # Space
([^>]+) # Attributes
(/)? # Singleton tag?
)? # The above three are optional
>}x
ATTRIBUTES_RE = %r{([\w:]+)(=(?:\w+|"[^"]+?"|'[^']+?'))?}x
ALLOWED_ATTR = %w(style title type lang dir class id cite datetime abbr) +
%w(colspan rowspan compact start media)
ALLOWED_HTML = %w(abbr acronym address b big blockquote br caption cite) +
%w(code col colgroup dd del dfn dir div dl dt em h1 h2 h3) +
%w(h4 h5 h6 hr i ins kbd kbd li menu ol p pre q s samp) +
%w(small span span strike strong style sub sup table tbody) +
%w(td tfoot th thead tr tt u ul var)
# Clean the content of unsupported HTML and attributes. This includes
# XML namespaced HTML. Sorry, but there's too much possibility for
# abuse.
def clean(content)
content = content.gsub(SIMPLE_TAG_RE) do |tag|
tagset = HTML_TAG_RE.match(tag)
if tagset.nil?
tag = Ruwiki.clean_entities(tag)
else
closer, name, attributes, single = tagset.captures
if ALLOWED_HTML.include?(name.downcase)
unless closer or attributes.nil?
attributes = attributes.scan(ATTRIBUTES_RE).map do |set|
if ALLOWED_ATTR.include?(set[0].downcase)
set.join
else
""
end
end.compact.join(" ")
tag = "<#{closer}#{name} #{attributes}#{single}>"
else
tag = "<#{closer}#{name}>"
end
else
tag = Ruwiki.clean_entities(tag)
end
end
tag
end
end
Ruwiki.clean_entities converts all instances of & => &, < => <,
and > => >.
-austin
···
On Thu, 14 Oct 2004 21:22:43 +0900, Alexey Verkhovsky <alex@verk.info> wrote:
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations