<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/
Thanks,
Peter
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end
Note the use of three capture groups to get the unchanged initial and final parts as well as the middle part that is altered. The %r{\b\w+\b} is a Regexp that matches words, \b is a word-boundary and \w is a word-character (short for [a-zA-Z0-9_]). Your use of String#capitalize! returns nil if no change is made.
file = File.read("test1.txt")
And, I'm getting this:
Peter
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end
Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
--
Posted via http://www.ruby-forum.com/\.
You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/
I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for the email (which is one reason that I change from // to %r{} construction of the Regexp so the / wouldn't have to be escaped. You may have to change the second group to (.*?) [reluctant match rather than greedy match] or adjust the third group to exactly match your input.
file = File.read("test1.txt")
And, I'm getting this:
Peter
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end
Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
-- Posted via http://www.ruby-forum.com/\.
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
--
Posted via http://www.ruby-forum.com/\.
You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/
I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.
Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>
Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end
Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.
Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end
Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.
because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).
cheers
jens
···
--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de http://www.prometheus-bildarchiv.de/
Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
-- Posted via http://www.ruby-forum.com/\.
You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/
I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.
Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>
OK, change this to a regexp:
1. surround with the regexp literal bits
%r{<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>}m
2. add the grouping ()'s
%r{(<row><entry><text><emph face="b">)(THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.)(</emph>)}m
3. replace text with wildcards .* or .*?
%r{(<row><entry><text><emph face="b">)(.*?)(</emph>)}m
I'm assuming that is not the WHOLE file since the <row><entry><text> tags are not closed. It it quite likely that .* is slurping a lot more that you think so that's why I've change this to .*? which matches as little as possible while continuing to succeed.
Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
File.open("test1.txt", "w") { |f| f.write output }
end
Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.
Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end
Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.
because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).
cheers
jens
--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de http://www.prometheus-bildarchiv.de/
Sorry, Jens, but, I have no idea what you're referring to here. I
googled oniguruma. I see what it is. I installed it, but, it didn't seem
to install successfully. Do I do a "require oniguruma" at the top of my
script?
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.
it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).
you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:
Do I do a "require oniguruma" at the top of my script?
sure. but you really don't need it to solve your task at hand.
it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).
you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture: