Hi there, I am looking at some old, confusing ruby code that works but
is really ugly. I'm hoping someone here can help me find a more ruby
way of rewriting it. Rather than post the ugly code, I'll describe
what it is trying to do.
The code reads in each line of an array, looks for a closing/ending "
and puts multiple lines into one element of another array if they are
part of the same string.
input: data_array = [ " \"Only one line.\"", " \"line 1 ", " line 2.\"
" ]
so data_array.size = 3
···
----
puts data_array
"Only one line."
"line 1
line 2."
----
I need to bring linked lines together like this:
new_array = [ " \"Only one line.\"", " \"line 1 line 2.\" "]
The code reads in each line of an array, looks for a closing/ending "
and puts multiple lines into one element of another array if they are
part of the same string.
Do you need to handle nested strings?
Here is a crude solution that doesn't. I tried it on 1.9.2
# The messy input
mess = [ " \"Only one line.\"", " \"line 1 ", " line 2.\"" ]
# A single string, from the mess concatenated
single = mess.join ""
# Find all matches of the regex
matches = single.scan /("[^"]*")/
# Flatten those results,
# to get lines
lines = matches.flatten
# Pretty print the lines
p lines
I don't think any regex, being a finite automaton, could handle nested
strings properly.
Have you considered writing a small lexer and parser? You may be
better doing that, can handle weirder strings too.
I'm not sure. I don't think so. I think the content is scrubbed
beforehand to replace all embedded " with '.
# The messy input
mess = [ " \"Only one line.\"", " \"line 1 ", " line 2.\"" ]
# A single string, from the mess concatenated
single = mess.join ""
matches = single.scan /("[^"]*")/
lines = matches.flatten
# Pretty print the lines
p lines
I'm not sure about the 'join' solution for 2 reasons:
1) this array may be 1000's of lines long and I wonder what the
performance will be like.
2) I'm thinking about getting rid of the initial array and reading
the data straight from the input files. In that case, i would have to
read in each line anyway.
Having said that, I will give this a try and see how it works with the
test data I have. Thanks!
Have you considered writing a small lexer and parser? You may be
better doing that, can handle weirder strings too.
You are both right. Johnny is using the formal language definition of
regular expression while 7stud is using the term to refer to particular
programming language constructs. Most modern regexp libraries allow
for patterns that go well beyond the formal language concept of
regular expressions.
Still, I'm not sure how you would defined nested strings using a single
quoting character:
"stuff"otherstuff"morestuff"
Is that an example of a nested string or just two strings with otherstuff
inbetween?
Gary Wright
···
On May 26, 2011, at 6:33 PM, 7stud -- wrote:
Johnny M. wrote in post #1001292:
I don't think any regex, being a finite automaton, could handle nested
strings properly.
There are regexes for nested parentheses--they use recursive regexes.