Reading from and writing to a Unicode encoded file

7stud2 · 18 May 2012 07:48

Hi,

I made a script to read from a Unicode encoded file and also to write
something back. The problem is that the stuff that gets written back is
turned into jibberish.

Is there any way of solving this other than manually changing the coding
of the file to UTF-8?

thank you
regards,
seba

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 May 2012 08:19

Sebastjan H. wrote in post #1061256:

Hi,

I made a script to read from a Unicode encoded file and also to write
something back. The problem is that the stuff that gets written back is
turned into jibberish.

Is there any way of solving this other than manually changing the coding
of the file to UTF-8?

thank you
regards,
seba

I refer to this post: Array of strings - finding letter combinations - Ruby - Ruby-Forum

regards,
seba

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 May 2012 09:22

Hi Sebastjan H,

You can use Iconv standard library of Ruby (Method Name: conv) which
help you to convert the unicode of the string or file.

Please refer:
http://ruby-doc.org/stdlib-1.9.2/libdoc/iconv/rdoc/Iconv.html#method-c-conv

Regards,
Vimal Raj

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 18 May 2012 10:14

Hello,
tested on windows:

open("data.txt", "rb:UTF-16LE") {|fin|
  open("odata.txt", "wb:UTF-8") { |fout|
    fout.write(fin.read())
  }
}

bye

···

--
Posted via http://www.ruby-forum.com/.

Quintus · 18 May 2012 10:51

Hi Sebastjan H,

You can use Iconv standard library of Ruby (Method Name: conv) which
help you to convert the unicode of the string or file.

Iconv is deprecated and will be removed. Ruby has built-in encoding
facilities, namely String#encode.

Please refer:
Class: Iconv (Ruby 1.9.2)

Regards,
Vimal Raj

Vale,
Marvin

···

Am 18.05.2012 11:22, schrieb Vimal Selvam:

7stud2 · 18 May 2012 10:55

Regis d'Aubarede wrote in post #1061272:

Hello,
tested on windows:

open("data.txt", "rb:UTF-16LE") {|fin|
  open("odata.txt", "wb:UTF-8") { |fout|
    fout.write(fin.read())
  }
}

Regards,

thx, it works for me too, however, I wanted to include it in the script
refered to above, so I tried this modification according to your modell:

···

--------------------------------------
file = ARGV[0]

File.open(file, "Unicode") {|fin|
  File.open(file, "wb:UTF-8") { |fout|
    fout.write(fin.read())
  }
}
--------------------------------------

and I get an error.

If possible, I want to run this file conversion prior to my other code,
but the file should be named the same and the content untouched. I know
that my version above would overwrite it.

Something like this:
1. Convert the file.
2. Reopen the file.
3. Read the content.
4. Run some code on the content.
5. Write something back to the file.

No. 1 is giving me the headache, the rest is in place:)

regards,
seba

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 May 2012 11:06

Sebastjan H. wrote in post #1061276:

File.open(file, "Unicode") {|fin|
  File.open(file, "wb:UTF-8") { |fout|
    fout.write(fin.read())
  }
}

regards,
seba

data=nil
open("data.txt", "rb:UTF-16LE") {|fin| data=fin.read() }
open("data.txt", "wb:UTF-8") { |fout| fout.write(data) } if data

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 18 May 2012 11:31

Regis d'Aubarede wrote in post #1061277:

Sebastjan H. wrote in post #1061276:

File.open(file, "Unicode") {|fin|
  File.open(file, "wb:UTF-8") { |fout|
    fout.write(fin.read())
  }
}

regards,
seba

data=nil
open("data.txt", "rb:UTF-16LE") {|fin| data=fin.read() }
open("data.txt", "wb:UTF-8") { |fout| fout.write(data) } if data

Thank you very much, works like a charm. I've replaced the actual
filename with a variable, so I can use ARGV.

kind regards,
seba

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 21 May 2012 10:41

Hi,

something goes wrong with this block of code when I use Shoes to package
an app:

data=nil
open("data.txt", "rb:UTF-16LE") {|fin| data=fin.read() }
open("data.txt", "wb:UTF-8") { |fout| fout.write(data) } if data

I've attached the entire script. If I comment out this block the
jibberish appears again in the file. However, If I let run this block of
code, the content is deleted and the Alert is not displayed.

Could someone take a look where I went wrong. I am still learning Shoes
too:)

kind regards,
Seba

Attachments:
http://www.ruby-forum.com/attachment/7411/dup_app.rb

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 21 May 2012 11:55

Sebastjan H. wrote in post #1061483:

Hi,

something goes wrong with this block of code when I use Shoes to package
an app:

data=nil
open("data.txt", "rb:UTF-16LE") {|fin| data=fin.read() }
open("data.txt", "wb:UTF-8") { |fout| fout.write(data) } if data

I've attached the entire script. If I comment out this block the
jibberish appears again in the file. However, If I let run this block of
code, the content is deleted and the Alert is not displayed.

Could someone take a look where I went wrong. I am still learning Shoes
too:)

kind regards,
Seba

one more note: the same code as attached (without the Shoes elements)
runs ok just as a script run from a command line.

···

--
Posted via http://www.ruby-forum.com/\.

ashbb · 21 May 2012 12:28

Hi Sebastjan,

What platform and Shoes are you using?

I downloaded your code (dup_app.rb) and replaced the following two lines.
Then it worked with Shoes 3 (0.r1514) on my Windows 7.

open(file, "r") {|fin| data=fin.read() }
open(file, "w:UTF-8") { |fout| fout.write(data) } if data

ashbb

7stud2 · 21 May 2012 12:45

ashbb shoeser wrote in post #1061501:

Hi Sebastjan,

What platform and Shoes are you using?

I downloaded your code (dup_app.rb) and replaced the following two
lines.
Then it worked with Shoes 3 (0.r1514) on my Windows 7.

open(file, "r") {|fin| data=fin.read() }
open(file, "w:UTF-8") { |fout| fout.write(data) } if data

ashbb

Hi ashbb,

I'm using Shoes 3 on Win7. I've just tried the above two lines and the
result is still the same:
- the content of the chosen file is removed
- the encoding is changed to ANSI
- the alert box is not displayed

seba

···

--
Posted via http://www.ruby-forum.com/\.

ashbb · 21 May 2012 14:31

Hi Sebastjan,

Ah, sorry. Try out the following again:

      data=nil
      #open(file, "r") {|fin| data=fin.read() }
      data = IO.read(file).force_encoding("UTF-8")
      open(file, "w:UTF-8") { |fout| fout.write(data) } if data

ashbb

7stud2 · 21 May 2012 14:57

ashbb shoeser wrote in post #1061510:

Hi Sebastjan,

Ah, sorry. Try out the following again:

      data=nil
      #open(file, "r") {|fin| data=fin.read() }
      data = IO.read(file).force_encoding("UTF-8")
      open(file, "w:UTF-8") { |fout| fout.write(data) } if data

ashbb

hi,

it still not working. Now the process runs through, but the stuff that
is written in the file is mixed with the legacy content and the new
content is again jibberish.

I am afraid I can't pinpoint the issue being a complete beginner. It
works fine without Shoes.

Does the code work for you?

regards,
seba

···

--
Posted via http://www.ruby-forum.com/\.

ashbb · 21 May 2012 21:39

Hi Sebastjan,

Now the process runs through

Good!

the stuff that is written in the file is mixed with
the legacy content

Me too.
But your code re-open the file with 'a+' mode.
So, I think this is a normal behavior.

the new content is again jibberish.

Ah,... what does that mean?

I got the file mixed the following:

Here are the unused characters:
"&a", "&b", "&c", "&d", "&e", .....

Do you mean that this is jibberish?

Sorry, I don't understand what you want to do correctly.

ashbb

7stud2 · 22 May 2012 08:15

ashbb shoeser wrote in post #1061553:

Hi Sebastjan,

Now the process runs through

Good!

the stuff that is written in the file is mixed with
the legacy content

Me too.
But your code re-open the file with 'a+' mode.
So, I think this is a normal behavior.

the new content is again jibberish.

Ah,... what does that mean?

Actually the new content is not even written to the file, but the file
is stil encoded as Unicode so some special characters in my language (č
and š) are not displayed corectly. For example, "č" is printed out like
栀攀挀甀爀爀攀渀琀昀漀爀洀甀氀愀⸀ऀ匀欀爀

I got the file mixed the following:

Here are the unused characters:
"&a", "&b", "&c", "&d", "&e", .....

If this is on the end of your file, then this is correct. I don't get
any of the added content written anywhere in the file.

Do you mean that this is jibberish?

Jibberish: 甀爀爀攀渀琀昀漀爀洀甀氀愀⸀ऀ匀欀爀

Sorry, I don't understand what you want to do correctly.

1. Input file: two column tab delimited and Unicode encoded
2. Replace the first column with ""
3. Run the rest of the code (finding duplicates, used and unused
charactersd)
4. Write the unused characters to the input file

I've attached the code which is compiled as *shy app again.

I know the main issue is, that my input file is Unicode encoded, but I
get that from another program that supports only Unicode.

Thank you for your patience:)

Two more notes:
- the *shy app is about 420 MB in size. Is that normal?
- the *shy app takes quite some time to load. Is that normal?

regards,
seba

Attachments:
http://www.ruby-forum.com/attachment/7415/dup_app.rb

···

--
Posted via http://www.ruby-forum.com/\.

ashbb · 22 May 2012 13:15

Hi Sebastjan,

I know the main issue is, that my input file is Unicode encoded

Oh, I see.
Why not using nkf?

Try out the following:

require 'nkf'
Shoes.app do
  extend NKF
  file = ask_open_file
  data = IO.read file
  para nkf('-W16w', data)
end

In my case with Shoes 3 (0.r1514), I can see some special characters in
your language (č and š) on the Shoes window.

Two more notes:
- the *shy app is about 420 MB in size. Is that normal?
- the *shy app takes quite some time to load. Is that normal?

Umm,... I'm not sure,... but I don't think they are normal...

ashbb

7stud2 · 22 May 2012 18:06

ashbb shoeser wrote in post #1061656:

Hi Sebastjan,

I know the main issue is, that my input file is Unicode encoded

Oh, I see.
Why not using nkf?

Try out the following:

require 'nkf'
Shoes.app do
  extend NKF
  file = ask_open_file
  data = IO.read file
  para nkf('-W16w', data)
end

I've tried incorporating this into my script, but I guess my knowledge
isn't sufficient:)

Furthermore, I've tried replicating the whole thing on Ubuntu and the
app is not as large and it loads extremely fast. However, the code still
doesn't tun properly with Shoes. And the files are aromatically
converted to UTF-8 as soon as they are stored on Ubuntu, so I can't
really replicate anything:(

kind regards,
seba

···

--
Posted via http://www.ruby-forum.com/\.

ashbb · 22 May 2012 22:18

Hi Sebastjan,

Umm,....
Can you move to Shoes-ML (http://librelist.com/browser/shoes/) ?
You'll get other Shoeser's helps.

ashbb

Topic		Replies	Views
Converting file from utf-16 to utf-8 ruby-talk	3	142	24 March 2010
Wanted: Script to convert to/from UTF-8/UTF-16/UTF-32 ruby-talk	2	188	31 August 2008
Saving an UTF-8 file ruby-talk	6	134	12 November 2006
Writing text files with an explicit encoding (UTF-16LE) ruby-talk	4	123	31 January 2008
How to do charset conversion in ruby? ruby-talk	3	120	21 July 2005

Reading from and writing to a Unicode encoded file

Related topics