I'm using the Ruby tidy gem to clean some user-input HTML. It works
splendidly on my Mac development machine, but seg faults on a CentOS
linux box.
I've tracked through the code, and the crash occurs in Tidybuf.rb's
to_s function. The "struct.bp" method returns a non-nil value (that
indicates a zero size), but the struct.size is some huge number which
varies run-to-run.
I've googled a ton, and there are a lot of people who have hit
segfaults using Ruby and tidy. Some of the issue seem to have been a
namespace conflict between Graphics/ImageMagick and Tidy, but we've
fixed that (by renaming tidy's GetToken function and recompiling), and
are still hitting a seg fault.
More detail:
Using a fresh Rails 1.2.5 app, I've stepped in console thru the parts
of Tidyobj.rb's clean method, like so:
Tidy.open do |t|
puts "*** BAD SAMPLE"
t.clean "<html>I am bad HTML!</html>"
puts t.errors
puts t.diagnostics
end
Tidy.open do |t|
puts "*** GOOD SAMPLE"
t.clean '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">' +
'<html><head><title>foo</title></head><body><p>bar</p></body></html>'
puts t.errors
puts t.diagnostics
end