Reading Text files with extended ASCII characters

Hello,

I have reviewed the mailing list ruby-talk and I found some interesting
stuff, but I haven’t seen the answer to reading (and creating) text
files with extended ASCII characters.

For example when I try to read a text file with the word:
Dön

I get:
D\366n

If I save this and read it in again I get:
D\366n

Oddly many normal punctuation characters are also exscaped, these
include (at least):
" #

I assume I am reading the text file incorrectly or not converting it to
something because when I read a line of a file I also get:
“what-ever-it-says\n”

I am reading and writing the file with the following commands:

File.delete (“fileout.html”) if FileTest.exist?(“fileout.html”)
File.open (‘filein.html’, “r”) do |f_in|
f_in.each_line do |line|
reformated = line.dump
reformated = line.sub ( /"(.*)\n"/,’\1’)
reformated = line.gsub (/\([#"])/, ‘\1’)
File.open (“fileout.html”,“a”) {|f_out|f_out.print reformated }
end
end

Is there a convient way to keep the extened ASCII characters (or convert
them to ö and is there a convient way to convert this ö and
convert it into an “ö” and save it into a text file?

I look forward to guidance.

Thanks,

Bill

···


William H. Tihen, Technology Director
The American School in Switzerland (TASIS)
tihen.william@tasis.ch
http://www.tasis.ch/

Hello,

I have reviewed the mailing list ruby-talk and I found some interesting
stuff, but I haven’t seen the answer to reading (and creating) text
files with extended ASCII characters.

For example when I try to read a text file with the word:
Dön

seems to work :

/tmp > ruby -e “f = File.open ‘foo’, ‘w’; f.puts ‘Dön’”

/tmp > cat foo
Dön

/tmp > ruby -e “puts (IO.readlines ‘foo’)”
Dön

/tmp > ruby -e “lines = IO.readlines ‘foo’; f = File.open ‘foo’, ‘w’; f.puts lines”

/tmp > cat foo
Dön

are you in dos? perhaps trying IO.binmode will help then…

when you say

I get:
D\366n

If I save this and read it in again I get:
D\366n

i assume you mean in irb? if so, this is simply because the display method of
strings displays ö (and other extended chars) as their octal escaped equivs -
the byte value in the file is still the same, which you can see using cat or
opening the file from a test editor.

-a

···

On Wed, 19 Feb 2003, William Tihen wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Thanks for the help Ara, I guess I just didn’t understand the file
operations. Now I will look to solve saving ö Ĉ or 0xF6 as an
actual “ö” into a text file.

I can’t seem to get unpack to convert “=264” into “ö”.

Any ideas

···

On Tue, 2003-02-18 at 19:08, ahoward wrote:

On Wed, 19 Feb 2003, William Tihen wrote:

Hello,

I have reviewed the mailing list ruby-talk and I found some interesting
stuff, but I haven’t seen the answer to reading (and creating) text
files with extended ASCII characters.

For example when I try to read a text file with the word:
Dön

seems to work :

/tmp > ruby -e “f = File.open ‘foo’, ‘w’; f.puts ‘Dön’”

/tmp > cat foo
Dön

/tmp > ruby -e “puts (IO.readlines ‘foo’)”
Dön

/tmp > ruby -e “lines = IO.readlines ‘foo’; f = File.open ‘foo’, ‘w’; f.puts lines”

/tmp > cat foo
Dön

are you in dos? perhaps trying IO.binmode will help then…

when you say

I get:
D\366n

If I save this and read it in again I get:
D\366n

i assume you mean in irb? if so, this is simply because the display method of
strings displays ö (and other extended chars) as their octal escaped equivs -
the byte value in the file is still the same, which you can see using cat or
opening the file from a test editor.

-a

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Thanks for the help Ara, I guess I just didn’t understand the file
operations. Now I will look to solve saving ö Ĉ or 0xF6 as an
actual “ö” into a text file.

I can’t seem to get unpack to convert “=264” into “ö”.

Any ideas

how about

/tmp > ruby -e ‘255.times {|n| printf “%c%s”, n, (n == 39 ? “\n” : " ")}’
` a b c e

o p q r s t u v w x y
( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K
L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ €  ‚ ƒ
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹
º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò
Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë
ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ /tmp > 1;2c

(this may hose your terminal but you get the idea)

-a

ps. running

/tmp > ruby -e ‘255.times {|n| printf “%d : %c\n”, n, n}’ | less

and searching for ‘ö’ you’ll see that it’s

246 : ö

not 264.

hope this helps.

···

On Wed, 19 Feb 2003, William H.Tihen wrote:

On Tue, 2003-02-18 at 19:08, ahoward wrote:

On Wed, 19 Feb 2003, William Tihen wrote:

Hello,

I have reviewed the mailing list ruby-talk and I found some interesting
stuff, but I haven’t seen the answer to reading (and creating) text
files with extended ASCII characters.

For example when I try to read a text file with the word:
Dön

seems to work :

/tmp > ruby -e “f = File.open ‘foo’, ‘w’; f.puts ‘Dön’”

/tmp > cat foo
Dön

/tmp > ruby -e “puts (IO.readlines ‘foo’)”
Dön

/tmp > ruby -e “lines = IO.readlines ‘foo’; f = File.open ‘foo’, ‘w’; f.puts lines”

/tmp > cat foo
Dön

are you in dos? perhaps trying IO.binmode will help then…

when you say

I get:
D\366n

If I save this and read it in again I get:
D\366n

i assume you mean in irb? if so, this is simply because the display method of
strings displays ö (and other extended chars) as their octal escaped equivs -
the byte value in the file is still the same, which you can see using cat or
opening the file from a test editor.

-a

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Quoteing tihen.william@tasis.ch, on Wed, Feb 19, 2003 at 08:42:31AM +0900:

Thanks for the help Ara, I guess I just didn’t understand the file
operations. Now I will look to solve saving ö Ĉ or 0xF6 as an
actual “ö” into a text file.

I can’t seem to get unpack to convert “=264” into “ö”.

Is that using the quoted-printable format? If so, quoted-printable uses
hex values, so “=264” is “&4” because 26hex is “&”…

Sam

Thanks again for the tip, it everything works well now – I might have
to clean up the code, but it all works.

Bill

···


William H. Tihen, Technology Director
The American School in Switzerland (TASIS)
tihen.william@tasis.ch
http://www.tasis.ch/