I’d like to split a message up into unquoted parts and quoted parts
(contiguous lines that start with a /^\s*>/ are considered “quoted
text”). If there are multiple quoted and unquoted parts, I need
them in order in an array.
The code below does what I need, but it seems awfully long and
convoluted for what it does. I think I may be approaching it wrong.
Any ideas on how to make it better?
msg.split("\n").each do |line|
next if line.length == 0
if line =~ /^\s*>/
if last_type == “unquoted” || last_type == nil
# this is a new quoted section
if unquoted_data.length > 0
words = unquoted_data.split
msg_parts << [“unquoted”,words] if words.length
unquoted_data = ""
end
end
last_type = "quoted"
quoted_data += "#{line}\n"
else # this is unquoted text
if last_type == “quoted” || last_type == nil
# this is a new unquoted section
if quoted_data.length > 0
words = quoted_data.split
msg_parts << [“quoted”,words] if words.length
quoted_data = ""
end
end
last_type = "unquoted"
unquoted_data += "#{line}\n"
end
end
get the last one.
if last_type == "unquoted"
if unquoted_data.length > 0
words = unquoted_data.split
msg_parts << [“unquoted”,words] if words.length
end
else
if quoted_data.length > 0
words = quoted_data.split
msg_parts << [“quoted”,words] if words.length
end
end
msg_parts
end
I’d like to split a message up into unquoted parts and quoted parts
(contiguous lines that start with a /^\s*>/ are considered “quoted
text”). If there are multiple quoted and unquoted parts, I need
them in order in an array.
The code below does what I need, but it seems awfully long and
convoluted for what it does. I think I may be approaching it wrong.
Any ideas on how to make it better?
Posting the version we’ve been talking about on #ruby-lang. For those
not reading that discussion: one quirky thing about this version is
that if you replace the “if” in the middle of the method with
"unless", you get exactly the same results (at least with the
admittedly ragtag bunch of test cases I ran it on). This is an effect
of the fact that if that test is false, midquote gets flipped (in the
else clause), so the if/unless and else clauses affect each other.
Anyway, here it is:
def make_parts(msg)
midquote = false
buf = []
msg.each do |line|
spline = line.strip.split(/\s+/)
next if spline.empty?
quote = /^\s*>/.match(line)
tag = if not quote then “un” end .to_s + "quoted"
if (midquote ^ quote)
buf << [ tag,[] ] if buf.empty?
buf[-1][1].concat spline
else
buf.concat [ [tag,spline] ]
midquote = !midquote
end
end
buf
end
Interesting logic You could make the code a little bit clearer, at
least for me:
def make_parts3(msg)
buf = []
last_tag = nil
msg.each do |line|
spline = line.strip.split(/\s+/)
next if spline.empty?
tag = /^\s*>/.match(line) ? “quoted” : "unquoted"
if (tag != last_tag)
buf << [ tag,spline ]
last_tag = tag
else
buf[-1][1].concat spline
end
end
buf
end
All implementations so far leave the quote in the output. I don’ t
know if that’s the desired behaviour. If you want to get rid of the
quote, you could do something like:
def make_parts4(msg)
buf = []
last_tag = nil
msg.each do |line|
if /^\s*>/.match(line)
tag = "quoted"
line = $'
else
tag = "unquoted"
end
spline = line.strip.split(/\s+/)
next if spline.empty?
if (tag != last_tag)
buf << [ tag,spline ]
last_tag = tag
else
buf[-1][1].concat spline
end
end
buf
end
(…) one quirky thing about this version is
that if you replace the “if” in the middle of the method with
"unless", you get exactly the same results (at least with the
admittedly ragtag bunch of test cases I ran it on). This is an effect
of the fact that if that test is false, midquote gets flipped (in the
else clause), so the if/unless and else clauses affect each other.
(…) one quirky thing about this version is
that if you replace the “if” in the middle of the method with
"unless", you get exactly the same results (at least with the
admittedly ragtag bunch of test cases I ran it on). This is an effect
of the fact that if that test is false, midquote gets flipped (in the
else clause), so the if/unless and else clauses affect each other.
Interesting logic You could make the code a little bit clearer, at
least for me:
def make_parts3(msg)
buf = []
last_tag = nil
msg.each do |line|
spline = line.strip.split(/\s+/)
next if spline.empty?
tag = /^\s*>/.match(line) ? “quoted” : “unquoted”
My little “un” trick was somewhat whimsical I do admittedly tend
to avoid using the ternary operator in Ruby – not on deeply
principled grounds, but on the purely aesthetic grounds that to my
eyes it jumps out as a C idiom. (NOTE: I’m not trying to talk people
out of using it – just explaining my not having done so
if (tag != last_tag)
buf << [ tag,spline ]
last_tag = tag
else
buf[-1][1].concat spline
end
end
buf
end
Yes, much nicer, consolidating the tags/flags.
All implementations so far leave the quote in the output. I don’ t
know if that’s the desired behaviour. If you want to get rid of the
quote, you could do something like:
[code]
I was just going for duplicating Joe’s output – not sure what the
eventually purpose is.