String frustration

Tim_Kynerd · 11 February 2003 21:40

Hi everyone,

Am having MAJOR trouble handling strings.

I have a file into which I’ve put some text. I want to read this text into
some data structures and, later, print it out.

All the related text is on a single line, but one of the text items
contains embedded newlines, which I’ve entered into the file as “\n”. The
portions of each line (record) are tab-separated to make them easier to
read.

No problems reading the file and doing everything else the way I want to,
and I end up with the string I intend to print out (that contains the
embedded newlines). BUT: no matter what I do, I can’t get either puts or
print to substitute newlines for the “\n” sequences. Both methods just
print them out as is. Not good.

I’ve tried several ways of quoting ("#{variable}", %Q(#{variable}), etc.)
with zero success. (“variable” in both cases is a String object.)

Any suggestions?

···

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:42
Sunset in Stockholm today: 16:21
My rail transit photos at http://www.kynerd.nu

Tim_Kynerd · 11 February 2003 21:40

Sorry for the follow-up to myself, but: Using the debugger makes it clear
that gets is escaping my backslashes, resulting in the sequence “\n” in
my data. This is NOT desirable.

Is there any way to control (read: quash) this behavior?

···

On Tue, 11 Feb 2003 20:41:41 +0100, Tim Kynerd wrote:

Hi everyone,

Am having MAJOR trouble handling strings.

I have a file into which I’ve put some text. I want to read this text into
some data structures and, later, print it out.

All the related text is on a single line, but one of the text items
contains embedded newlines, which I’ve entered into the file as “\n”. The
portions of each line (record) are tab-separated to make them easier to
read.

No problems reading the file and doing everything else the way I want to,
and I end up with the string I intend to print out (that contains the
embedded newlines). BUT: no matter what I do, I can’t get either puts or
print to substitute newlines for the “\n” sequences. Both methods just
print them out as is. Not good.

I’ve tried several ways of quoting (“#{variable}”, %Q(#{variable}), etc.)
with zero success. (“variable” in both cases is a String object.)

Any suggestions?

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:42
Sunset in Stockholm today: 16:21
My rail transit photos at http://www.kynerd.nu

Daniel_Carrera · 11 February 2003 21:51

No problems reading the file and doing everything else the way I want to,
and I end up with the string I intend to print out (that contains the
embedded newlines). BUT: no matter what I do, I can’t get either puts or
print to substitute newlines for the “\n” sequences. Both methods just
print them out as is. Not good.

Tell me if I understand the problem correctly. You have string of the
form:

str = ‘line one\nline two\nline three’ # Notice the single quotes.

And you want to replace the characters ‘\n’ by the actual new-line “\n”.
Did I understand correctly?

If this is so, would a regex do what you want?:

$ irb

str = ‘line one\nline two\nline three’
=> “line one\nline two\nline three”
str.gsub!(/\n/, “\n”)
=> “line one\nline two\nline three”
puts str
line one
line two
line three
=> nil

···

–
Daniel Carrera
Graduate Teaching Assistant. Math Dept.
University of Maryland. (301) 405-5137

Bill_Kelly · 11 February 2003 23:18

Hi,

BUT: no matter what I do, I can’t get either puts or
print to substitute newlines for the “\n” sequences. Both methods just
print them out as is. Not good.

How 'bout: print str.gsub(/\n/, “\n”)

HTH,

Bill

···

From: “Tim Kynerd” tim@tram.nu

Harry_Ohlsen1 · 12 February 2003 05:16

A number of people have given you solutions to the
problem, so I won’t rehash that.

However, in case you have misunderstood what was
going on, I thought I’d point out that the two
characters ‘’ and ‘n’ next to each other in your
text file would never be interpreted as a newline,
but always as two separate characters.

They would be interpreted as a newline if they
appeared in a string in your source code (as in
the replacement text in the gsub() someone
suggested)).

My apologies if you already understood that!

···

Harry Ohlsen
QIQ Solutions
E-Mail: harryo@qiqsolutions.com
Web: http://www.qiqsolutions.com/
Phone: +61 2 9209 4171

Mark_Slagell · 11 February 2003 22:12

Tim Kynerd wrote:

[snip]

…
that gets is escaping my backslashes, resulting in the sequence “\n” in
my data. This is NOT desirable.

Is there any way to control (read: quash) this behavior?

When debugging, strings are shown as if they had been entered in double
quotes, so ‘\n’ appears as “\n”. That’s two characters (“\” is a
backslash), both of which are in your input file. It looks like gets is
reading exactly what you gave it.

Mark

Tim_Kynerd · 12 February 2003 06:21

Works like a charm. In my frustration, I missed the gsub method in the
pickaxe book (although I had tried String#sub with no success). Thanks!

···

On Wed, 12 Feb 2003 08:18:33 +0900, Bill Kelly wrote:

Hi,

From: “Tim Kynerd” tim@tram.nu

BUT: no matter what I do, I can’t get either puts or
print to substitute newlines for the “\n” sequences. Both methods just
print them out as is. Not good.

How 'bout: print str.gsub(/\n/, “\n”)

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Tim_Kynerd · 12 February 2003 09:42

No apologies necessary. I’m not 100% clear on this.

Although I think I understand what’s going on here, it leaves me wondering
what would be a better way to do what I need to do, if there is a better
way.

The input file contains three tab-separated fields. The second field needs
to be able to contain an arbitrary number of newlines (which will be
realized as actual newlines, so to speak, when this field is later printed
out).

The only alternative I can see (to including them as “\n” in the second
field) is to actually put the newlines into the file as newlines, so to
speak, and specify a different string as the record separator (say,
an uncommon character followed by a newline). However, this obviously
makes the file rather difficult to create in a text editor, and to read.

Then, since there happen to be exactly three fields in each line, there’s
the idea of tab-separating the “pieces” of the second field, then reading
everything in, taking the first and last fields as the first and third,
and interpreting everything in between as the second field. This would
work, but I don’t like it because it depends crucially on the fact that
there are exactly three fields per line (and that may not remain true as I
continue to play with this program I’m writing).

Any other ideas?

···

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

A number of people have given you solutions to the problem, so I won’t
rehash that.

However, in case you have misunderstood what was going on, I thought I’d
point out that the two characters '' and ‘n’ next to each other in your
text file would never be interpreted as a newline, but always as two
separate characters.

They would be interpreted as a newline if they appeared in a string in
your source code (as in the replacement text in the gsub() someone
suggested)).

My apologies if you already understood that!

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Hugh_Sasse · 12 February 2003 10:37

The input file contains three tab-separated fields. The second field needs
to be able to contain an arbitrary number of newlines (which will be
realized as actual newlines, so to speak, when this field is later printed
out).

    [...]

Then, since there happen to be exactly three fields in each line, there’s
the idea of tab-separating the “pieces” of the second field, then reading

irb(main):044:0> puts (sprintf “%s”, “x\ny”)
x
y
nil
irb(main):045:0>

so readlines.each {|line|
fields = line.split(/\t/);
fields.each {|field|
puts (sprintf “%s”, field)
}
}

···

On Wed, 12 Feb 2003, Tim Kynerd wrote:

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu

Mauricio_Fernndez · 12 February 2003 12:38

No apologies necessary. I’m not 100% clear on this.

Although I think I understand what’s going on here, it leaves me wondering
what would be a better way to do what I need to do, if there is a better
way.

The input file contains three tab-separated fields. The second field needs
to be able to contain an arbitrary number of newlines (which will be
realized as actual newlines, so to speak, when this field is later printed
out).

You want multi-line regex mode!

a =“1\tsecond field\nwith newlines\nbla\nbla\tthird\n”
=> “1\tsecond field\nwith newlines\nbla\nbla\tthird\n”
puts a
1 second field
with newlines
bla
bla third
=> nil
re = %r{([^\t]+)\t([^\t]+)\t([^\t]+)\n}m
=> /([^\t]+)\t([^\t]+)\t([^\t]+)\n/m
a =~ re
=> 0
p $1, $2, $3
“1”
“second field\nwith newlines\nbla\nbla”
“third”
=> nil

···

On Wed, Feb 12, 2003 at 06:42:33PM +0900, Tim Kynerd wrote:

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

Linux: Where Don’t We Want To Go Today?
– Submitted by Pancrazio De Mauro, paraphrasing some well-known sales talk

Warren_Brown3 · 12 February 2003 16:29

Tim,

Although I think I understand what’s going on here,
it leaves me wondering what would be a better way
to do what I need to do, if there is a better way.

The input file contains three tab-separated fields.
The second field needs to be able to contain an
arbitrary number of newlines (which will be
realized as actual newlines, so to speak, when this
field is later printed out).

The basic problem here is that you have a two-level data structure that

you want to represent. The first level is three fields separated by the
string “\t”, and the second level is zero or more sub-fields separated by
‘\n’ in the second field. Your decision to use ‘\n’ as a separator led you
to a small confusion when you tried to display the second field. As others
have pointed out, this can be achieved by simply replacing the two-character
string ‘\n’ with the single character string “\n” (note the difference in
quotes). Also note that the string ‘\n’ is really just a sub-field
delimiter, and could actually be any string (or character) other than your
field separator (“\t”) that is guaranteed not to be part of a sub-field.

I would think that it would be useful for you to have this two-level

data structure represented in your code. For instance:

irb(main):001:0> line = “field 1\tfield 2.1\nfield 2.2\nfield 2.3\tfield
3”
=> “field 1\tfield 2.1\nfield 2.2\nfield 2.3\tfield 3”

irb(main):002:0> fields = line.split(/\t/)
=> [“field 1”, “field 2.1\nfield 2.2\nfield 2.3”, “field 3”]

irb(main):003:0> fields[1] = fields[1].split(/\n/)
=> [“field 2.1”, “field 2.2”, “field 2.3”]

irb(main):004:0> fields
=> [“field 1”, [“field 2.1”, “field 2.2”, “field 2.3”], “field 3”]

You now have access to the sub-fields (e.g. fields[1].length,

fields[1][0], etc.) and can easily join them back together for display:

irb(main):005:0> puts fields[1].join(“\n”)
field 2.1
field 2.2
field 2.3
=> nil

Hope this helps!

- Warren Brown

Eric_Hodel1 · 12 February 2003 18:30

YAML - http://raa.ruby-lang.org/list.rhtml?name=yaml

···

Tim Kynerd (tim@tram.nu) wrote:

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

Although I think I understand what’s going on here, it leaves me wondering
what would be a better way to do what I need to do, if there is a better
way.

The input file contains three tab-separated fields. The second field needs
to be able to contain an arbitrary number of newlines (which will be
realized as actual newlines, so to speak, when this field is later printed
out).

The only alternative I can see (to including them as “\n” in the second
field) is to actually put the newlines into the file as newlines, so to
speak, and specify a different string as the record separator (say,
an uncommon character followed by a newline). However, this obviously
makes the file rather difficult to create in a text editor, and to read.

Then, since there happen to be exactly three fields in each line, there’s
the idea of tab-separating the “pieces” of the second field, then reading
everything in, taking the first and last fields as the first and third,
and interpreting everything in between as the second field. This would
work, but I don’t like it because it depends crucially on the fact that
there are exactly three fields per line (and that may not remain true as I
continue to play with this program I’m writing).

Any other ideas?

–
Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Tim_Kynerd · 12 February 2003 16:23

No offense, but if I wanted THAT, I’d be using Perl.

Thanks anyway.

···

On Wed, 12 Feb 2003 21:38:47 +0900, Mauricio Fernández wrote:

On Wed, Feb 12, 2003 at 06:42:33PM +0900, Tim Kynerd wrote:

No apologies necessary. I’m not 100% clear on this.

Although I think I understand what’s going on here, it leaves me
wondering what would be a better way to do what I need to do, if there
is a better way.

The input file contains three tab-separated fields. The second field
needs to be able to contain an arbitrary number of newlines (which will
be realized as actual newlines, so to speak, when this field is later
printed out).

You want multi-line regex mode!

a =“1\tsecond field\nwith newlines\nbla\nbla\tthird\n”
=> “1\tsecond field\nwith newlines\nbla\nbla\tthird\n”
puts a
1 second field
with newlines
bla
bla third
=> nil
re = %r{([^\t]+)\t([^\t]+)\t([^\t]+)\n}m
=> /([^\t]+)\t([^\t]+)\t([^\t]+)\n/m
a =~ re
=> 0
p $1, $2, $3
“1”
“second field\nwith newlines\nbla\nbla” “third”
=> nil

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Tim_Kynerd · 12 February 2003 16:23

So using readlines rather than gets is the key. OK, that makes things
easier. Thanks!

···

On Wed, 12 Feb 2003 19:37:35 +0900, Hugh Sasse Staff Elec Eng wrote:

On Wed, 12 Feb 2003, Tim Kynerd wrote:

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

The input file contains three tab-separated fields. The second field
needs to be able to contain an arbitrary number of newlines (which will
be realized as actual newlines, so to speak, when this field is later
printed out).
    [...]
Then, since there happen to be exactly three fields in each line,
there’s the idea of tab-separating the “pieces” of the second field,
then reading

irb(main):044:0> puts (sprintf “%s”, “x\ny”) x
y
nil
irb(main):045:0>

so readlines.each {|line|
fields = line.split(/\t/);
fields.each {|field|
puts (sprintf “%s”, field)
}
}

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Tim_Kynerd · 12 February 2003 18:24

Thanks, Warren. I thought about doing this as well.

However, while it works, conceptually this isn’t what I want to do. The
second field IS a single field; it’s just that when I later print it out,
it needs to spread over multiple lines. Embedding the newlines in the
original value seems to me to be the simplest way to achieve this, but it
turns out that hoops must be jumped through in order to get those newlines
to do their job.

···

On Thu, 13 Feb 2003 01:29:02 +0900, Warren Brown wrote:

Tim,

Although I think I understand what’s going on here, it leaves me
wondering what would be a better way to do what I need to do, if there
is a better way.

The input file contains three tab-separated fields. The second field
needs to be able to contain an arbitrary number of newlines (which will
be realized as actual newlines, so to speak, when this field is later
printed out).
The basic problem here is that you have a two-level data structure
that
you want to represent. The first level is three fields separated by the
string “\t”, and the second level is zero or more sub-fields separated by
‘\n’ in the second field. Your decision to use ‘\n’ as a separator led
you to a small confusion when you tried to display the second field. As
others have pointed out, this can be achieved by simply replacing the
two-character string ‘\n’ with the single character string “\n” (note the
difference in quotes). Also note that the string ‘\n’ is really just a
sub-field delimiter, and could actually be any string (or character) other
than your field separator (“\t”) that is guaranteed not to be part of a
sub-field.
I would think that it would be useful for you to have this two-level
data structure represented in your code. For instance:

irb(main):001:0> line = “field 1\tfield 2.1\nfield 2.2\nfield 2.3\tfield
3”
=> “field 1\tfield 2.1\nfield 2.2\nfield 2.3\tfield 3”

irb(main):002:0> fields = line.split(/\t/) => [“field 1”, “field
2.1\nfield 2.2\nfield 2.3”, “field 3”]

irb(main):003:0> fields[1] = fields[1].split(/\n/) => [“field 2.1”,
“field 2.2”, “field 2.3”]

irb(main):004:0> fields
=> [“field 1”, [“field 2.1”, “field 2.2”, “field 2.3”], “field 3”]
You now have access to the sub-fields (e.g. fields[1].length,
fields[1][0], etc.) and can easily join them back together for display:

irb(main):005:0> puts fields[1].join(“\n”) field 2.1
field 2.2
field 2.3
=> nil
Hope this helps!

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Tim_Kynerd · 13 February 2003 13:08

Will check it out, thanks! (I went to the Web site for YAML for Ruby and
it’s quite funny. Plus YAML looks useful. )

···

On Thu, 13 Feb 2003 03:30:27 +0900, Eric Hodel wrote:

Tim Kynerd (tim@tram.nu) wrote:

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

Although I think I understand what’s going on here, it leaves me
wondering what would be a better way to do what I need to do, if there
is a better way.

The input file contains three tab-separated fields. The second field
needs to be able to contain an arbitrary number of newlines (which will
be realized as actual newlines, so to speak, when this field is later
printed out).

The only alternative I can see (to including them as “\n” in the second
field) is to actually put the newlines into the file as newlines, so to
speak, and specify a different string as the record separator (say, an
uncommon character followed by a newline). However, this obviously makes
the file rather difficult to create in a text editor, and to read.

Then, since there happen to be exactly three fields in each line,
there’s the idea of tab-separating the “pieces” of the second field,
then reading everything in, taking the first and last fields as the
first and third, and interpreting everything in between as the second
field. This would work, but I don’t like it because it depends crucially
on the fact that there are exactly three fields per line (and that may
not remain true as I continue to play with this program I’m writing).

Any other ideas?

YAML - http://raa.ruby-lang.org/list.rhtml?name=yaml

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:37
Sunset in Stockholm today: 16:26
My rail transit photos at http://www.kynerd.nu

Hugh_Sasse · 12 February 2003 16:41

irb(main):044:0> puts (sprintf “%s”, “x\ny”) x
y
nil
irb(main):045:0>

so readlines.each {|line|
fields = line.split(/\t/);
fields.each {|field|
puts (sprintf “%s”, field)
}
}

So using readlines rather than gets is the key. OK, that makes things
easier. Thanks!

My point was really the %s, just with a bit of context, but if that
helps…

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu

    Hugh

···

On Thu, 13 Feb 2003, Tim Kynerd wrote:

On Wed, 12 Feb 2003 19:37:35 +0900, Hugh Sasse Staff Elec Eng wrote:

Ara.T.Howard1 · 12 February 2003 18:44

rexgexs are always ugly - but powerfull. how can they not be, they are
complete computer programs in one line!

perl can’t do this

a = “a 1\tb 0\nb 1\nb 2\nb 3\tc 0\nd 0\te 0\ne 1\ne 2\ne 3\tf 0\n”

re =
%r/
# field one is the minimal field which ends in a tab
# it cannot contain newlines

  ([^\n]*?) \t

  # field two is the subsequent minimal field which ends in a tab
  # it can contain newlines

  (.*?) \t

  # field three is the subsequent minimal field which ends in a newline
  # it cannot contain newlines

  ([^\n]*?) \n
/omx

fields = a.scan re # the part perl can’t do
p fields

[[“a 1”, “b 0\nb 1\nb 2\nb 3”, “c 0”], [“d 0”, “e 0\ne 1\ne 2\ne 3”, “f 0”]]

scan is cooool because it does the whole thing at once.

-a

···

On Wed, 12 Feb 2003, Tim Kynerd wrote:

No offense, but if I wanted THAT, I’d be using Perl.

Thanks anyway.

–

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Mauricio_Fernndez · 12 February 2003 19:34

re = %r{([^\t]+)\t([^\t]+)\t([^\t]+)\n}m
=> /([^\t]+)\t([^\t]+)\t([^\t]+)\n/m
a =~ re
=> 0
p $1, $2, $3
“1”
“second field\nwith newlines\nbla\nbla” “third”
=> nil

No offense, but if I wanted THAT, I’d be using Perl.

No offence??? You just called me Perl-lover, man!
[never really used Perl]

Thanks anyway.

If you want to get it more “the Ruby way”,

use the object oriented regex interface or
use String#scan

But AFAIK, matz doesn’t consider regexes to be an evil perlish feature to
be deprecated. I’ll go on using them.

···

On Thu, Feb 13, 2003 at 01:23:37AM +0900, Tim Kynerd wrote:

–
_ _

__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

How do you power off this machine?
– Linus, when upgrading linux.cs.helsinki.fi, and after using the machine for several months

Tim_Kynerd · 12 February 2003 18:24

It seemed to. Oh well, anyway, it’s working now. Thanks again.

···

On Thu, 13 Feb 2003 01:41:28 +0900, Hugh Sasse Staff Elec Eng wrote:

On Thu, 13 Feb 2003, Tim Kynerd wrote:

On Wed, 12 Feb 2003 19:37:35 +0900, Hugh Sasse Staff Elec Eng wrote:

irb(main):044:0> puts (sprintf “%s”, “x\ny”) x y
nil
irb(main):045:0>

so readlines.each {|line|
fields = line.split(/\t/);
fields.each {|field|
puts (sprintf “%s”, field)
}
}
}
So using readlines rather than gets is the key. OK, that makes things
easier. Thanks!

My point was really the %s, just with a bit of context, but if that
helps…

–
Tim Kynerd Sundbyberg (småstan i storstan), Sweden tim@tram.nu
Sunrise in Stockholm today: 7:39
Sunset in Stockholm today: 16:24
My rail transit photos at http://www.kynerd.nu

Topic		Replies	Views
Text Parsing Help ruby-talk	9	133	4 December 2010
Speed up suggestions ruby-talk	21	153	24 September 2002
1.8.7 String#lines keeps new-line chars (say it ain't so in 1.9) ruby-talk	17	179	24 August 2009
Problem removing new line characters on Mac OS X ruby-talk	17	174	17 May 2007
Some Regexp ruby-talk	19	98	3 December 2003

String frustration

On Wed, 12 Feb 2003 14:16:03 +0900, Harry Ohlsen wrote:

Related topics