I am currently moving an application from Python to Ruby for a training purpose and to learn Ruby. Inside this application I am parsing text files delivered by news agencies. These follow more or less a specification developed by the IPTC consortium. But back to the question.
At the momemt I use a concatenation of strings as an input for the Regexp, but I asked myself wether I can use a HERE document, as it would make things a lot clearer without all the these single and double quotes around strings. But sadly inside a HERE document, the \n at the end of a line are used by the regexp. Is it possible to write a HERE document or something like that with \n inside, but afterwards the \n the source are skipped?
Or may be there is an even better way to do it. I even think about writing a bunch of methods to parse all the stuff without a regex.
The cleanest solution is to make a regular expression that can work
regardless of the presence of newlines. You probably want multiline
mode. I'd need to see your specific example.
Or you could strip out the newlines with mystring.sub("\n","").
regards,
Ed
···
On Wed, Dec 07, 2005 at 07:47:34AM +0900, Oliver Andrich wrote:
But sadly inside a HERE document, the \n at the end of a line are
used by the regexp. Is it possible to write a HERE document or
something like that with \n inside, but afterwards the \n the source
are skipped?
This little "baby" does the job. As ruby doesn't have named groups in regexps, I have to add comment lines (?#...) to document the invidual groups. This would glutter the thing even more. Now ruby has these nice HERE documents, %r{...} and so on. I would be happy if I could achieve something like that.
msg_rx = %r{
^\x01?
(?# comment for the line)
([a-zA-Z]{3,4})(?P<msgnum>\\d{3,4})\s
(\\d)\s
(?# comment for the line)
([a-zA-Z]{1,3})\s
(?# comment for the line)
(\\d{1,4})\s
(.*)\r\n*
\x02
(?# comment for the line)
(?:(.*)=\\s*\r\n)?
(?# comment for the line)
(.*)
\x03.*
(?# comment for the line)
(\\d{2})(\\d{2})(\\d{2})\s
(?# comment for the line)
([a-zA-Z]{3})\s
(?# comment for the line)
(\\d{2})
}
Thinks looks a lot cleaner for me, but sadly the "\n" at the end of the lines are in my way. I could strip them, but if it would just "happen" it would be nicer.
Hopefully, this makes my question a little clearer.
Thinks looks a lot cleaner for me, but sadly the "\n" at the end of the lines are in my way. I could strip them, but if it would just "happen" it would be nicer.
use a /x swicth, it should work even with %r stuff:
>> rgx=%r[
foo #foo
bar #bar
]x
=> /
foo #foo
bar #bar
/x
>> m=rgx.match "foobar"
=> #<MatchData:0x29c9490>
>> m[0]
=> "foobar"
This little "baby" does the job. As ruby doesn't have named groups in
regexps, I have to add comment lines (?#...) to document the invidual
groups. This would glutter the thing even more. Now ruby has these nice
HERE documents, %r{...} and so on. I would be happy if I could achieve
something like that.
msg_rx = %r{
^\x01?
(?# comment for the line)
([a-zA-Z]{3,4})(?P<msgnum>\\d{3,4})\s
(\\d)\s
(?# comment for the line)
([a-zA-Z]{1,3})\s
(?# comment for the line)
(\\d{1,4})\s
(.*)\r\n*
(.*)\r\n*
\x02
(?# comment for the line)
(?:(.*)=\\s*\r\n)?
(?# comment for the line)
(.*)
\x03.*
(?# comment for the line)
(\\d{2})(\\d{2})(\\d{2})\s
(?# comment for the line)
([a-zA-Z]{3})\s
(?# comment for the line)
(\\d{2})
}
Use extended mode:
msg_rx = %r{
^\x01?
# comment for the line
([a-zA-Z]{3,4}) (<msgnum>\d{3,4}) \s
(\d) \s
}x