Hi,
Every now and then I get errors relating to UTF8 encodings, and each time I fail to (guess) find the right combination of words to get Ruby 1.92 to play nice with some string it doesn't like.
Right now I want to open a log file and read it, but some script kiddie has decided to connect using some crazy non ASCII characters, and this line in my script
File.readlines(logfile, :encoding => "UTF-8" )
Now spits out the error:
ArgumentError - invalid byte sequence in UTF-8
when encountering lines like this:
83.44.178.124 - - [19/Jul/2011:19:15:00 +0100] ?.???S\x08\x02?N~],>~Q?~@6\x15`ҷ?~Vg?'dR\x1C??\x08?F\x06w?~H?~F?\x08P~V?\x0Bf\x22?\x17~M^??{??j\x1E??p?~AU~\\
"400" 166 "-" "-" "-"
I'd really like to know how to fix this without dropping 1.9. Does anyone know the magic words that will get this logfile read? These are my best efforts
File.readlines(logfile, :encoding => "UTF-8" ).map{|e| e.force_encoding('UTF-8')}
File.readlines(logfile, :encoding => "UTF-8" ).map{|e| e.encode('UTF-8', undef: :replace, replace: "??")}
File.readlines(logfile, :encoding => "UTF-8" ).map{|e| e.encode('iso-8859-1', undef: :replace, replace: "??")}
They fail They do read a logfile with valid utf8 in there. Any help is much appreciated.
Regards,
Iain