Downcase part of a string

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

after downcase it should look like "this is a text and (NO Change HERE)
help"

I don't want to downcase the letters in parentheses.
How can i do that, i tried it with regular expressions but can't do
it.

Thanks for any help

ilhamik:

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

If the parentheses occur only once:

if msg =~ /\(.*?\)/
  $~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman

ilhamik wrote:

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/\(.+?\)/)
msg.downcase!
altered = msg.scan(/\(.+?\)/)
original.each_with_index { |stuff, i| msg.sub!(altered[i],stuff) }

···

--
Peter
http://www.rubyrailways.com

Certainly not pretty with that funky regex, but it works:

msg = "THIS is a Text and (NO Change HERE) HELP (Not here Either)"

msg.gsub!(/([^\(]*(?!\())|(\(.*?\))|(\)[^\)]*\))/) do |m|
  m[0] == 40 ? m : m.downcase
end

- Scott

ilhamik wrote:

···

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

after downcase it should look like "this is a text and (NO Change HERE)
help"

I don't want to downcase the letters in parentheses.
How can i do that, i tried it with regular expressions but can't do
it.

Thanks for any help

No, they can occur more then onece.

Kalman Noel wrote:

···

ilhamik:
> hi,
> I want to downcase a string but without specific parts.
> for example:
> msg = "THIS is a Text and (NO Change HERE) HELP"

If the parentheses occur only once:

if msg =~ /\(.*?\)/
  $~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman

Thanks Peter, it works fine.

Peter Szinek wrote:

···

ilhamik wrote:
> hi,
> I want to downcase a string but without specific parts.
> for example:
> msg = "THIS is a Text and (NO Change HERE) HELP"

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/\(.+?\)/)
msg.downcase!
altered = msg.scan(/\(.+?\)/)
original.each_with_index { |stuff, i| msg.sub!(altered[i],stuff) }

--
Peter
http://www.rubyrailways.com

Prettier regexp, paid for with two more steps:

msg = "THIS is a Text and (NO Change HERE) HELP (Not here Either)"

(")"+msg+"(").gsub(/\)(.*?)\(/) {|i| i.downcase}[1..-2]

martin

···

On 10/22/06, Scott <bauer.mail@gmail.com> wrote:

Certainly not pretty with that funky regex, but it works:

msg = "THIS is a Text and (NO Change HERE) HELP (Not here Either)"

msg.gsub!(/([^\(]*(?!\())|(\(.*?\))|(\)[^\)]*\))/) do |m|
        m[0] == 40 ? m : m.downcase
end

- Scott

ilhamik wrote:
> hi,
> I want to downcase a string but without specific parts.
> for example:
> msg = "THIS is a Text and (NO Change HERE) HELP"
>
> after downcase it should look like "this is a text and (NO Change HERE)
> help"
>
> I don't want to downcase the letters in parentheses.
> How can i do that, i tried it with regular expressions but can't do
> it.
>
> Thanks for any help

You missed Tim Bray's RubyConf talk. According to him we should, never be using the case changing methods. "Just don't do it!" :wink:

James Edward Gray II

···

On Oct 22, 2006, at 7:30 AM, ilhamik wrote:

Thanks Peter, it works fine.

James Edward Gray II wrote:

···

On Oct 22, 2006, at 7:30 AM, ilhamik wrote:

Thanks Peter, it works fine.

You missed Tim Bray's RubyConf talk. According to him we should, never be using the case changing methods. "Just don't do it!" :wink:

James Edward Gray II

Why not? What reason did he give?
Cheers

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

···

On 10/22/06, Mike Durham <mdurham@people.net.au> wrote:

James Edward Gray II wrote:
> On Oct 22, 2006, at 7:30 AM, ilhamik wrote:
>
>> Thanks Peter, it works fine.
>
> You missed Tim Bray's RubyConf talk. According to him we should, never
> be using the case changing methods. "Just don't do it!" :wink:
>
> James Edward Gray II
>
Why not? What reason did he give?

Wilson Bilkovich wrote:

···

On 10/22/06, Mike Durham <mdurham@people.net.au> wrote:

James Edward Gray II wrote:
> On Oct 22, 2006, at 7:30 AM, ilhamik wrote:
>
>> Thanks Peter, it works fine.
>
> You missed Tim Bray's RubyConf talk. According to him we should, never
> be using the case changing methods. "Just don't do it!" :wink:
>
> James Edward Gray II
>
Why not? What reason did he give?

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

Thanks Wilson, that explains everything. I'd never thought about problems like that.
Cheers, Mike

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the practical sense. When you get around to downcasing that string a user entered into your web form a month back, are you going to know if that string was encoded in a Turkish local (critical info if it contains an "i")?

Even if it were possible, Tim suggests that it's a performance killer. See Java, which tries to address as many rules as it possibly can, for proof.

James Edward Gray II

···

On Oct 22, 2006, at 8:16 PM, Wilson Bilkovich wrote:

On 10/22/06, Mike Durham <mdurham@people.net.au> wrote:

James Edward Gray II wrote:
> On Oct 22, 2006, at 7:30 AM, ilhamik wrote:
>
>> Thanks Peter, it works fine.
>
> You missed Tim Bray's RubyConf talk. According to him we should, never
> be using the case changing methods. "Just don't do it!" :wink:
>
> James Edward Gray II
>
Why not? What reason did he give?

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

Fred

···

Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

--
The devil and his has me down In love with the dark side I've found
Dabblin' all the way down Up to my neck soon to drown
But you changed that all for me Lifted me up, turned me round
                                                           (Tool, Jambi)

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

No, not depending on jurisdiction in France. In French French, one
would capitalize être as Etre. In Canadian French, one would
capitalize it as Être.

Also, in Turkish, there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.

Not quite. There are two different 'i' letters: one with a dot, one
without. One is capitalized with a dot and one is capitalized without
the dot.

Also, the German eszet (ß, as in Schloß) would be capitalized as
SCHLOSS, but downcasing that would be schloss, not necessarily schloß.
(Actually, and the Germans here will correct me on this I'm sure, I
think it would always be Schloss or Schloß becaus the leading S would
not be lowercased in proper German. Looking at some German webpages
suggests so.)

Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

Not impossible, just fraught with errors and performance issues. One
would not only have to have the locale lookup stuff, but one would
have to do statistical analysis to get better than mostly wrong with
anything but English. :wink:

-austin

···

On 10/22/06, Wilson Bilkovich <wilsonb@gmail.com> wrote:
--
Austin Ziegler * halostatue@gmail.com * http://www.halostatue.ca/
               * austin@halostatue.ca * You are in a maze of twisty little passages, all alike. // halo • statue
               * austin@zieglers.ca

one caveat that tim did not mention, and which is quite applicable to many
small sites, is that you simply don't always have to care. for instance, if
your site is in english only to don't have to care. now, i'm not saying that
is a good idea - but a whole tons of successful business models work that way:
many successful newspapers, for example, publish in english only. the trick
is knowing if that's what you want up front. if that's unacceptable then it
does seem like you're screwed.

-a

···

On Wed, 25 Oct 2006, James Edward Gray II wrote:

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a user
entered into your web form a month back, are you going to know if that
string was encoded in a Turkish local (critical info if it contains an "i")?

Even if it were possible, Tim suggests that it's a performance killer. See
Java, which tries to address as many rules as it possibly can, for proof.

James Edward Gray II

--
my religion is very simple. my religion is kindness. -- the dalai lama

F. Senault wrote:

···

Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

That's very interesting. So Tim is mistaken?

Hal

"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

···

On 10/24/06, Hal Fulton <hal9000@hypermetrics.com> wrote:

F. Senault wrote:
> Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :
>
>>As examples, he mentioned that the uppercase version of accented
>>characters varies from area to area in France.
>
> This is way off topic, but I'd like to know where he heard that. It's
> the first time for me, and I'm a native french speaker...

That's very interesting. So Tim is mistaken?

Hal

I've been told that common usage differs in Québec. -Tim

···

On Oct 24, 2006, at 4:57 PM, Hal Fulton wrote:

F. Senault wrote:

Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

That's very interesting. So Tim is mistaken?

x1 wrote:

"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

Hal

It's entirely possible I'm mis-remembering that part of Tim's talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented 'e' character on it.

···

On 10/24/06, Hal Fulton <hal9000@hypermetrics.com> wrote:

x1 wrote:
> "no".capitalize, Tim is right, but ruby is a "logical" language for
> me. Trying to accommodate for 6800 languages with various character
> types is not logical. Unfortunately, not manufacturing bikini's
> because some people can't wear them doesn't seem like the optimal
> solution in my opinion. If someone wants to upcase Latin, write a ruby
> library and share it.

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.