I've just finished an extensive reworking of the standard CSV library in Ruby 1.9 (formerly FasterCSV). CSV's parser and generator are now m17n aware. This means they should work naturally with your data in any non-"dummy" Encoding Ruby 1.9 supports.
Everything is documented so it should be pretty easy to figure out how to use the new system, but generally you just set the Encoding for your IO or String objects correctly and CSV should do the rest:
# reading example
CSV.foreach(…, :encoding => "…") do |row|
# row will be parsed but not transcoded here
end
# writing example
CSV.open(…, "wb:…") do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end
Encodings default to Encoding.default_external if not provided.
I had to change quite a bit of code to support this. I tried to test well, but it's possible I introduced some new bugs. Please let me know if you find any issues.
I suspect this is probably one of the first full m17n compatible implementations, so I hope it can serve as a guide to others wanting to provide similar support in their libraries. I know I learned a ton just figuring out how to do this. Feel free to ask me questions about mulit-encoding support. I'll sure try to answer them if I can.
Finally, here's some fun news to look forward to: even with the m17n support, CSV on Ruby 1.9 is over three times faster than FasterCSV on Ruby 1.8 thanks to the speed of the new VM and the switch to Oniguruma. Three cheers to the core team for giving us a much faster Ruby!
James Edward Gray II
Awesome James!
FasterCSV is under very heavy utilization over here and we're always glad you
made such a fine library.
enjoy,
-jeremy
···
On Mon, Sep 22, 2008 at 02:06:42AM +0900, James Gray wrote:
I've just finished an extensive reworking of the standard CSV library in
Ruby 1.9 (formerly FasterCSV). CSV's parser and generator are now m17n
aware. This means they should work naturally with your data in any
non-"dummy" Encoding Ruby 1.9 supports.
Everything is documented so it should be pretty easy to figure out how to
use the new system, but generally you just set the Encoding for your IO or
String objects correctly and CSV should do the rest:
# reading example
CSV.foreach(?, :encoding => "?") do |row|
# row will be parsed but not transcoded here
end
# writing example
CSV.open(?, "wb:?") do |csv|
csv << data
# data will be quoted and separated with characters
# in the proper encoding
end
Encodings default to Encoding.default_external if not provided.
I had to change quite a bit of code to support this. I tried to test well,
but it's possible I introduced some new bugs. Please let me know if you
find any issues.
I suspect this is probably one of the first full m17n compatible
implementations, so I hope it can serve as a guide to others wanting to
provide similar support in their libraries. I know I learned a ton just
figuring out how to do this. Feel free to ask me questions about
mulit-encoding support. I'll sure try to answer them if I can.
Finally, here's some fun news to look forward to: even with the m17n
support, CSV on Ruby 1.9 is over three times faster than FasterCSV on Ruby
1.8 thanks to the speed of the new VM and the switch to Oniguruma. Three
cheers to the core team for giving us a much faster Ruby!
--
Jeremy Hinegardner jeremy@hinegardner.org