Argument error --- How to solve?

I´m at the very beginning with Ruby and give again and again this error

program.rb:6:in `gsub': broken utf-8 string (argumenterror)

when I'm trying this short code:

#coding:utf-8

temp=""
txtfile=File.open("8-3_tiedosto.txt","r");txtfile.each{|row|temp=temp+row};txtfile.close

temp = temp.gsub("Å", '')
puts temp

The original text in the file contains characters that I do not to
include to my final result, that should only contain ASCII 65..90 and
97..122. So I do not understand, what arguments should be given to gsub?

I'm sorry because of my stupidity :slight_smile:

···

--
Posted via http://www.ruby-forum.com/.

Your first argument to gsub appears to be ASCII 197.

···

On Wed, Nov 9, 2011 at 12:35 PM, Ar Ik <arsi.ikonen@gmail.com> wrote:

I´m at the very beginning with Ruby and give again and again this error

program.rb:6:in `gsub': broken utf-8 string (argumenterror)

when I'm trying this short code:

#coding:utf-8

temp=""
txtfile=File.open("8-3_tiedosto.txt","r");txtfile.each{|row|temp=temp+row};txtfile.close

temp = temp.gsub("Å", '')
puts temp

The original text in the file contains characters that I do not to
include to my final result, that should only contain ASCII 65..90 and
97..122. So I do not understand, what arguments should be given to gsub?

I'm sorry because of my stupidity :slight_smile:

--
Posted via http://www.ruby-forum.com/\.

--
Carina

Not possile to edit previous, so additional comment: My environment do
not allow to change character statement at the first line...

···

--
Posted via http://www.ruby-forum.com/.

--Try doing this and see if it helps with your substitution experience,
without getting too involved with Ruby's encoding mechanism

#coding:utf-8

## Do NOT delete the above utf-8 line, which
## you already have in your original copy

temp=""
txtfile=File.open("8-3_tiedosto.txt","r")
txtfile.each{|row|temp=temp+row}
txtfile.close

tmp = temp.gsub(/[^A-Z0-9[:punct:]\s]+/ix, '')

puts tmp

PS--I left the numericals and all kinds of punctuational marks in there,
just in case if you have them in the original file--though there are
certainly not within your original range of ASCII 65..90 and 97..122

···

--
Posted via http://www.ruby-forum.com/.

C. Zona wrote in post #1031196:

Your first argument to gsub appears to be ASCII 197.

Yes, You're correct, but still I do not know how to fix my code...As the
source text contains chars not among 65..90 and 97..122, how I can
remove or replace them?

···

--
Posted via http://www.ruby-forum.com/\.

If you insert this line at the beginning of the script, what does it print?

p("".encoding)

Btw, you can simplify reading by doing

txtfile = File.read("8-3_tiedosto.txt", encoding: 'UTF-8')

assuming your file is encoded in UTF-8.

You might have to play with
Encoding.default_external=
Encoding.default_internal=

For more please see

Kind regards

robert

···

On Thu, Nov 10, 2011 at 9:03 AM, Ar Ik <arsi.ikonen@gmail.com> wrote:

Not possile to edit previous, so additional comment: My environment do
not allow to change character statement at the first line...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-----Messaggio originale-----

···

Da: Nik Z. [mailto:esperantoca@gmail.com]
Inviato: giovedì 10 novembre 2011 23:55
A: ruby-talk ML
Oggetto: Re: Argument error --- How to solve?

--Try doing this and see if it helps with your substitution experience,
without getting too involved with Ruby's encoding mechanism

#coding:utf-8

## Do NOT delete the above utf-8 line, which ## you already have in your
original copy

temp=""
txtfile=File.open("8-3_tiedosto.txt","r")
txtfile.each{|row|temp=temp+row}
txtfile.close

tmp = temp.gsub(/[^A-Z0-9[:punct:]\s]+/ix, '')

puts tmp

PS--I left the numericals and all kinds of punctuational marks in there,
just in case if you have them in the original file--though there are
certainly not within your original range of ASCII 65..90 and 97..122

--
Posted via http://www.ruby-forum.com/.

--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Conto Arancio al 4,20%. Zero spese e massima liberta', aprilo in due minuti!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid922&d)-12

Ar Ik wrote in post #1031208:

C. Zona wrote in post #1031196:

Your first argument to gsub appears to be ASCII 197.

Yes, You're correct, but still I do not know how to fix my code...As the
source text contains chars not among 65..90 and 97..122, how I can
remove or replace them?

Strings in ruby 1.9 are complicated beasts. I had a go at understanding
them:

So it really depends on what you're trying to do. If you want to
manipulate this file as a series of bytes, and match particular bytes,
then open it in binary mode ('rb'), and pass only binary strings to
gsub.

  temp.gsub!("xxx".force_encoding("BINARY"), "")

The trouble with opening the file as UTF-8, and doing regexp matches
with UTF-8 characters, is that your program will crash when fed invalid
UTF-8 data. So it is not good for "data cleaning" exercises.

But strangely, ruby 1.9 is quite happy to deal with invalid strings in
some contexts. For example, if you do

   temp.size.times do |i|
     puts temp[i]
   end

then it will work even if the i'th character is invalid. Go figure.

···

--
Posted via http://www.ruby-forum.com/\.

-----Messaggio originale-----

···

Da: Luca (Email) [mailto:luca.pagano@email.it]
Inviato: giovedì 29 dicembre 2011 07:58
A: ruby-talk ML
Oggetto: I: Argument error --- How to solve?

--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Riccione Hotel 3 stelle in centro: Pacchetto Capodanno mezza pensione, animazione bimbi, zona relax, parcheggio. Scopri l'offerta solo per oggi...
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid983&d)-12