String scanning woes :(

Hello all,

I am trying to get a file into an array using .scan and I can't seem
to get anything to work properly.

I am reading in a file of email addresses (1 per line) and it all
seems to come in as 1 long string some how. I am trying to use scan
to break it up into an array of emails so that I can do some uniq
checks and validation with other arrays. But I just don't seem to get
it right.

My code right now is as follows:

emails = File.open("/users/lem/desktop/test/
POCs_DNB.txt","r").readlines.map! {|x| x.chomp} # Read in the list
of emails
email.scan(/\S+/) # To mach on spaces (I assume). I thought I would
be matching on new lines
puts email # To verify

When I did an inspect on the email variable The address appeared as
such

"foo1@bar.edu\foo2@bar.com\foo3@bar.gov......"

This is my absolute first time working with .scan and regular
expressions so I have a little bit of a learning curve with this one.

Any help is greatly appreciated.

You could try this to make it a bit easier:

File.open("/users/lem/desktop/test/POCs_DNB.txt", "r").each_line do |
line>
  line.chomp!
  # now you have a single line (sans newline) from your file
end

# no need to close the file either :smiley:

···

On Feb 4, 7:50 pm, Vell <lovell.mcilw...@gmail.com> wrote:

Hello all,

I am trying to get a file into an array using .scan and I can't seem
to get anything to work properly.

I am reading in a file of email addresses (1 per line) and it all
seems to come in as 1 long string some how. I am trying to use scan
to break it up into an array of emails so that I can do some uniq
checks and validation with other arrays. But I just don't seem to get
it right.

My code right now is as follows:

emails = File.open("/users/lem/desktop/test/
POCs_DNB.txt","r").readlines.map! {|x| x.chomp} # Read in the list
of emails
email.scan(/\S+/) # To mach on spaces (I assume). I thought I would
be matching on new lines
puts email # To verify

When I did an inspect on the email variable The address appeared as
such

"f...@bar.edu\f...@bar.com\f...@bar.gov......"

This is my absolute first time working with .scan and regular
expressions so I have a little bit of a learning curve with this one.

Any help is greatly appreciated.

to avoid doubt, try slowly.

this is a first mod/run of your posted code, eg,

botp@pc4all:~$ cat test.txt
foo1@bar.edu
foo2@bar.com
foo3@bar.gov

botp@pc4all:~$ cat test.rb
p File.readlines("test.txt").map{|x| x.chomp}

botp@pc4all:~$ ruby test.rb
["foo1@bar.edu", "foo2@bar.com", "foo3@bar.gov"]

that is just one way. there are many ways if using ruby.

kind regards -botp

···

On Feb 5, 2008 8:54 AM, Vell <lovell.mcilwain@gmail.com> wrote:

I am trying to get a file into an array using .scan and I can't seem
to get anything to work properly.

Lovell Mcilwain wrote:

Hello all,

I am trying to get a file into an array using .scan and I can't seem
to get anything to work properly.

I am reading in a file of email addresses (1 per line) and it all
seems to come in as 1 long string some how. I am trying to use scan
to break it up into an array of emails so that I can do some uniq
checks and validation with other arrays. But I just don't seem to get
it right.

My code right now is as follows:

emails = File.open("/users/lem/desktop/test/
POCs_DNB.txt","r").readlines.map! {|x| x.chomp}

email.scan(/\S+/)

scan() returns an array. You don't assign the array to any variable, so
it is discarded.

When I did an inspect on the email variable The address appeared as
such

"foo1@bar.edu\foo2@bar.com\foo3@bar.gov......"

Nowhere in the code you posted does a variable named email exist.

This is my absolute first time working with .scan and regular
expressions so I have a little bit of a learning curve with this one.

Any help is greatly appreciated.

If you expect to get relevant help, you should post a short example
progrram that demonstrates your problem, i.e. an example program that
anyone can run and get the same results you do.

···

--
Posted via http://www.ruby-forum.com/\.

First you say "emails"; then you say "email".
This is not code that will run.

Didn't you copy and paste? Don't tell us that you
retyped the code because you were eager for the chance
to introduce errors.

p IO.readlines( "data" ).map{|x| x.strip }
p IO.read( "data" ).split
p IO.read( "data" ).scan(/\S+/)

···

On Feb 4, 6:50 pm, Vell <lovell.mcilw...@gmail.com> wrote:

emails = File.open("/users/lem/desktop/test/
POCs_DNB.txt","r").readlines.map! {|x| x.chomp}
email.scan(/\S+/)

Lovell Mcilwain wrote:
> Hello all,

> I am trying to get a file into an array using .scan and I can't seem
> to get anything to work properly.

> I am reading in a file of email addresses (1 per line) and it all
> seems to come in as 1 long string some how. I am trying to use scan
> to break it up into an array of emails so that I can do some uniq
> checks and validation with other arrays. But I just don't seem to get
> it right.

> My code right now is as follows:

> emails = File.open("/users/lem/desktop/test/
> POCs_DNB.txt","r").readlines.map! {|x| x.chomp}

> email.scan(/\S+/)

scan() returns an array. You don't assign the array to any variable, so
it is discarded.

> When I did an inspect on the email variable The address appeared as
> such

> "f...@bar.edu\f...@bar.com\f...@bar.gov......"

Nowhere in the code you posted does a variable named email exist.

Very first line of my code is what I thought to be a variable...

> This is my absolute first time working with .scan and regular
> expressions so I have a little bit of a learning curve with this one.

> Any help is greatly appreciated.

If you expect to get relevant help, you should post a short example
progrram that demonstrates your problem, i.e. an example program that
anyone can run and get the same results you do.

The code I posted is exactly what I ran aside for giving you
hundrededs of lines of email. The example is exactly what I ran to
get the results I posted.

···

On Feb 4, 9:07 pm, 7stud -- <bbxx789_0...@yahoo.com> wrote:

--
Posted viahttp://www.ruby-forum.com/.

> emails = File.open("/users/lem/desktop/test/
> POCs_DNB.txt","r").readlines.map! {|x| x.chomp}
> email.scan(/\S+/)

First you say "emails"; then you say "email".
This is not code that will run.

Didn't you copy and paste? Don't tell us that you
retyped the code because you were eager for the chance
to introduce errors.

I'm a beginner, lighten up James.

···

On Feb 5, 6:40 am, William James <w_a_x_...@yahoo.com> wrote:

On Feb 4, 6:50 pm, Vell <lovell.mcilw...@gmail.com> wrote:

p IO.readlines( "data" ).map{|x| x.strip }
p IO.read( "data" ).split
p IO.read( "data" ).scan(/\S+/)

Lovell Mcilwain wrote:

The code I posted is exactly what I ran aside for giving you
hundrededs of lines of email. The example is exactly what I ran to
get the results I posted.

emails = File.open("data.txt").readlines.map! {|x| x.chomp}
email.scan(/\S+/)

--output:--
r1test.rb:2: undefined local variable or method `email' for main:Object
(NameError)

···

--
Posted via http://www.ruby-forum.com/\.

Though shalt use the block form of File.open to ensure proper cleanup!

Apart from that there is another way:

require 'set'
addresses = Set.new

File.foreach "data.txt" do |line|
  line.chomp!
  line.downcase!

  puts "Duplicate: #{line}" unless addresses.add? line
end

Cheers

robert

···

2008/2/5, 7stud -- <bbxx789_05ss@yahoo.com>:

Lovell Mcilwain wrote:
>
> The code I posted is exactly what I ran aside for giving you
> hundrededs of lines of email. The example is exactly what I ran to
> get the results I posted.
>

emails = File.open("data.txt").readlines.map! {|x| x.chomp}
email.scan(/\S+/)

--
use.inject do |as, often| as.you_can - without end

h = {}
File.foreach("data"){|e|
  e = e.strip.upcase
  puts "Duplicate: #{ e }" if h.include? e
  h[ e ] = true
}

···

On Feb 5, 3:52 am, Robert Klemme <shortcut...@googlemail.com> wrote:

require 'set'
addresses = Set.new

File.foreach "data.txt" do |line|
  line.chomp!
  line.downcase!

  puts "Duplicate: #{line}" unless addresses.add? line
end

Thanks guys for all the helpful hints.

···

On Feb 5, 7:20 am, William James <w_a_x_...@yahoo.com> wrote:

On Feb 5, 3:52 am, Robert Klemme <shortcut...@googlemail.com> wrote:

> require 'set'
> addresses = Set.new

> File.foreach "data.txt" do |line|
> line.chomp!
> line.downcase!

> puts "Duplicate: #{line}" unless addresses.add? line
> end

h = {}
File.foreach("data"){|e|
e = e.strip.upcase
puts "Duplicate: #{ e }" if h.include? e
h[ e ] = true

}