Capitalizing words

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
array = $1.split
array.each do |word|
  word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

···

--
Posted via http://www.ruby-forum.com/.

I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd

···

On Tue, Apr 8, 2008 at 1:53 PM, Peter Bailey <pbailey@bna.com> wrote:

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
  array = $1.split
  array.each do |word|
  word.capitalize!
  end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

Hi,
I need to capitalize the words in a string I find in XML files.

The string that's in (.*) below is what I need to change. I just want to
capitalize the first letter of each word in the string.

I'm trying this, in a test:

Dir.chdir("C:/users/pb4072/documents")
file = File.read("test1.txt")
file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match>
array = $1.split
array.each do |word|
word.capitalize!
end
newfile = File.open("c:/users/pb4072/documents/test1.txt", "w") { |f|
f.print array }
end

And, I'm getting this:

#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

Dir.chdir("C:/users/pb4072/documents") do |d|
   file = File.read("test1.txt")
   output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</

)}m) do |match|

     "#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
   end
   File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and final parts as well as the middle part that is altered. The %r{\b\w+\b} is a Regexp that matches words, \b is a word-boundary and \w is a word-character (short for [a-zA-Z0-9_]). Your use of String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Apr 8, 2008, at 2:53 PM, Peter Bailey wrote:

Todd Benson wrote:

···

On Tue, Apr 8, 2008 at 1:53 PM, Peter Bailey <pbailey@bna.com> wrote:

file.gsub(/^<row><entry><text><emph face="b">(.*)<\/emph>/) do |match|
#(.*)<\/emph>theQuickBrownFoxJumpedOverTheLazyDog.

I want this:

<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.<\/emph>/

Thanks,
Peter

I don't know what the original text looks like in test1.txt, but this
might point you in the right direction...

irb(main):001:0> s = "the quick brown fox"
=> "the quick brown fox"
irb(main):002:0> s.split.map {|w| w.capitalize}.join ' '
=> "The Quick Brown Fox"

Todd

Thanks, Todd.
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

Should I just make your "s" equal to $1 from my original gsub?

-Peter
--
Posted via http://www.ruby-forum.com/\.

Rob Biedenharn wrote:

···

On Apr 8, 2008, at 2:53 PM, Peter Bailey wrote:

file = File.read("test1.txt")
And, I'm getting this:
Peter

Dir.chdir("C:/users/pb4072/documents") do |d|
   file = File.read("test1.txt")
   output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
>)}m) do |match|
     "#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
   end
   File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
--
Posted via http://www.ruby-forum.com/\.

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for the email (which is one reason that I change from // to %r{} construction of the Regexp so the / wouldn't have to be escaped. You may have to change the second group to (.*?) [reluctant match rather than greedy match] or adjust the third group to exactly match your input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Apr 9, 2008, at 8:24 AM, Peter Bailey wrote:

Rob Biedenharn wrote:

On Apr 8, 2008, at 2:53 PM, Peter Bailey wrote:

file = File.read("test1.txt")
And, I'm getting this:
Peter

Dir.chdir("C:/users/pb4072/documents") do |d|
  file = File.read("test1.txt")
  output = file.gsub(%r{^(<row><entry><text><emph face="b">)(.*)(</
>)}m) do |match|
    "#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
  end
  File.open("test1.txt", "w") { |f| f.write output }
end

Note the use of three capture groups to get the unchanged initial and
final parts as well as the middle part that is altered. The %r{\b\w+
\b} is a Regexp that matches words, \b is a word-boundary and \w is a
word-character (short for [a-zA-Z0-9_]). Your use of
String#capitalize! returns nil if no change is made.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
-- Posted via http://www.ruby-forum.com/\.

Rob Biedenharn wrote:

  end

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
--
Posted via http://www.ruby-forum.com/\.

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>

Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
   File.open("test1.txt", "w") { |f| f.write output }
end

Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.

Thanks,
Peter

···

On Apr 9, 2008, at 8:24 AM, Peter Bailey wrote:

--
Posted via http://www.ruby-forum.com/\.

hi peter!

Peter Bailey [2008-04-09 20:04]:

Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.

because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).

cheers
jens

···

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de
http://www.prometheus-bildarchiv.de/

Rob Biedenharn wrote:

end

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Thanks, Rob. This works beautifully, except that I need that last
</emph> in my output. It's being stripped with your code. I don't see
why, because it's just your $3, isn't it?
-- Posted via http://www.ruby-forum.com/\.

You said to Todd:
The original text is just:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.<\/emph>/

I assumed that the "<\/emph>/" part was a cut-n-paste of a regexp for
the email (which is one reason that I change from // to %r{}
construction of the Regexp so the / wouldn't have to be escaped. You
may have to change the second group to (.*?) [reluctant match rather
than greedy match] or adjust the third group to exactly match your
input.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

Rob,
So, here's my original file:
<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>

OK, change this to a regexp:
1. surround with the regexp literal bits
%r{<row><entry><text><emph face="b">THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.</emph>}m

2. add the grouping ()'s
%r{(<row><entry><text><emph face="b">)(THE QUICK BROWN FOX JUMPED OVER THE
LAZY DOG.)(</emph>)}m

3. replace text with wildcards .* or .*?
%r{(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

4. (optional?) add anchor ^
%r{^(<row><entry><text><emph face="b">)(.*?)(</emph>)}m

I'm assuming that is not the WHOLE file since the <row><entry><text> tags are not closed. It it quite likely that .* is slurping a lot more that you think so that's why I've change this to .*? which matches as little as possible while continuing to succeed.

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Apr 9, 2008, at 2:04 PM, Peter Bailey wrote:

On Apr 9, 2008, at 8:24 AM, Peter Bailey wrote:

Here's my code, from you:
Dir.chdir("C:/users/pb4072/documents") do |d|
file = File.read("test1.txt")
output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end
  File.open("test1.txt", "w") { |f| f.write output }
end

Here's what I get. It works great, but, I don't understand why the $3
text is simply blown away.
<row><entry><text><emph face="b">The Quick Brown Fox Jumped Over The
Lazy Dog.

Thanks,
Peter

Peter Bailey [2008-04-09 20:04]:

output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
  "#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end

oh, and for the fun of it, here's what you can do with oniguruma:

  Oniguruma::ORegexp.new(
    '(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
  ).gsub(file) { |md|
    md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
  }

(note that i needed to change '.*' to '.+')

cheers
jens

Ah yes! Good catch, Jens.

Peter, you only *need* to capture $3, but it would make sense to get them all:

head, content, tail = $1, $2, $3
"#{head}#{content.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{tail}"

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

···

On Apr 9, 2008, at 2:13 PM, Jens Wille wrote:

hi peter!

Peter Bailey [2008-04-09 20:04]:

Dir.chdir("C:/users/pb4072/documents") do |d| file =
File.read("test1.txt") output =
file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
"#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}" end
File.open("test1.txt", "w") { |f| f.write output } end

Here's what I get. It works great, but, I don't understand why
the $3 text is simply blown away.

because it's reset when you're doing that gsub on $2. the capture
variables only refer to the *last* match. so you have to capture
them into local variables first (can't think of a better way right now).

cheers
jens

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de
http://www.prometheus-bildarchiv.de/

Jens Wille wrote:

Peter Bailey [2008-04-09 20:04]:

output = file.gsub(%r{^(<row><entry><text><emph
face="b">)(.*)(<\/emph>)}m) do |match|
  "#{$1}#{$2.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{$3}"
end

oh, and for the fun of it, here's what you can do with oniguruma:

  Oniguruma::ORegexp.new(
    '(?<=^<row><entry><text><emph face="b">).+(?=</emph>)', 'm'
  ).gsub(file) { |md|
    md[0].gsub(%r{\b\w+\b}) { |w| w.capitalize }
  }

(note that i needed to change '.*' to '.+')

cheers
jens

Sorry, Jens, but, I have no idea what you're referring to here. I
googled oniguruma. I see what it is. I installed it, but, it didn't seem
to install successfully. Do I do a "require oniguruma" at the top of my
script?

···

--
Posted via http://www.ruby-forum.com/\.

Rob Biedenharn [2008-04-09 20:46]:

head, content, tail = $1, $2, $3
"#{head}#{content.gsub(%r{\b\w+\b}){|w|w.capitalize}}#{tail}"

now here's a quick implementation that passes the MatchData object
into the block:

<http://prometheus.khi.uni-koeln.de/svn/scratch/ruby-nuggets/lib/nuggets/string/sub_with_md.rb&gt;

so that code effectively becomes:

  str.gsub_with_md(re) { |md|
    "#{md[1]}#{md[2].gsub(%r{\b\w+\b}){|w|w.capitalize}}#{md[3]}"
  }

:wink:

cheers
jens

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

  file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
    "#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
  }

cheers
jens

Jens Wille wrote:

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

sure. but you really don't need it to solve your task at hand.

it's just the new regexp engine for ruby 1.9 and sometimes i like to
do some stuff with it that the default engine of 1.8 can't do
(zero-width look-behind in this case).

you can still simplify your substitution by using the look-ahead
(which 1.8 *does* understand), so you get rid of the third capture:

  file.gsub(%r{^(<row><entry><text><emph>
face="b">)(.*)(?=<\/emph>)}m) {
    "#{$1}#{$2.gsub(%r{\b\w+\b}) { |w| w.capitalize }}"
  }

cheers
jens

Thanks. But, again, do I need to do a "require" for oniguruma at the
top?
Cheers,
Peter

···

--
Posted via http://www.ruby-forum.com/\.

Peter Bailey [2008-04-10 16:20]:

Jens Wille wrote:

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

sure. but you really don't need it to solve your task at hand.

Thanks. But, again, do I need to do a "require" for oniguruma at
the top?

if you want to use oniguruma, then yes, you have to require it first.

Jens Wille wrote:

Peter Bailey [2008-04-10 16:20]:

Jens Wille wrote:

Peter Bailey [2008-04-10 14:26]:

Do I do a "require oniguruma" at the top of my script?

sure. but you really don't need it to solve your task at hand.

Thanks. But, again, do I need to do a "require" for oniguruma at
the top?

if you want to use oniguruma, then yes, you have to require it first.

OK. Thanks!

···

--
Posted via http://www.ruby-forum.com/\.