it seems, it’s a little bit trickier because accentuated characters are
taken as \b
Really? That’s arguably a bug. What character encoding are you using?
Accented letters should be in \w, not \W, and therefore the
space between one and an adjacent letter should not match \b.
But Ruby regexes may be ASCII-only, and even if not, they’re probably
Latin-1-only. So, for instance, they wouldn’t work on UTF-8 strings.
Vosne-romanée
becomes :
Vosne-RomanéE
then instead of \b i would have to exclude a list of chars :
[à|ä|â|é|è|ê|î|ö|ô|ü|ù]
First, you don’t need the pipes (|'s) there. Pipes are for
alternation without the […]; basically, [abc] is short for
(a|b|c). The pipe form is most useful when the alternatives are
not all single characters, for instance (alfa|bravo|charlie).
I’m not sure whether the exclude-list or the include-list would
be shorter. You could do (^|[- ']) to match “beginning of string or
dash or space or apostrophe”, but then that character would be included
in the resulting string. Which means that it would be, for instance,
" d" or “-d” or “'d” instead of “d”, and therefore won’t be in the
blacklist and won’t capitalize properly (since String#capitalize operates
on the first character, which will be the space or dash or apostrophe).
The block has to compensate for that. Something like this:
string.gsub!(/(^|[- '])([a-z]+)/) { $1 + $2.capitalize }
Except that [a-z] won’t match accented characters, so it’s more like this:
string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
And if the names aren’t limited to French, then even more special characters
creep in . . .
-Mark
···
On Tue, Sep 23, 2003 at 07:23:52PM +0200, Yvon Thoraval wrote: