how do I match non-english alphabetical characters? Such as the german
double-s ? (ß)
db
···
–
A.D. 1844: Samuel Morse invents Morse code. Cryptography export
restrictions prevent the telegraph’s use outside the U.S. and Canada.
how do I match non-english alphabetical characters? Such as the german
double-s ? (ß)
db
–
A.D. 1844: Samuel Morse invents Morse code. Cryptography export
restrictions prevent the telegraph’s use outside the U.S. and Canada.
Hi,
In message “non-english characters” on 03/12/17, Daniel Bretoi lists@debonair.net writes:
how do I match non-english alphabetical characters? Such as the german
double-s ? (ß)
Which encoding do you wish to use?
matz.
I’m not sure, how can I find out what the germans use? and once I know
that part, how do I use it?
db
On Wed, Dec 17, 2003 at 02:31:32PM +0900, Yukihiro Matsumoto wrote:
Hi,
In message “non-english characters” > on 03/12/17, Daniel Bretoi lists@debonair.net writes:
how do I match non-english alphabetical characters? Such as the german
double-s ? (?)Which encoding do you wish to use?
Hi,
In message “Re: non-english characters” on 03/12/17, Daniel Bretoi lists@debonair.net writes:
I’m not sure, how can I find out what the germans use? and once I know
that part, how do I use it?
Ask somebody around you to find out. Then if you’re going to use
Unicode (UTF-8), write your script in UTF-8 and invoke Ruby with -Ku
option. If you use ISO-8859-* or any other single byte encoding, you
don’t have to do anything special.
matz.
Hi!
Hi,
how do I match non-english alphabetical characters? Such as the german
double-s ? (ß)Which encoding do you wish to use?
I’m not sure, how can I find out what the germans use? and once I
know that part, how do I use it?
For German you can use an awful lot of different encodings Take a
look at the charsets listed at http://dwd.da.ru/charsets/index.html
Most likely ISO 8859-1, ISO 8859-15, or UTF-8 are used but ISO 8859-2
is also in use. The ISO charsets have Umlauts and ß in identical
positions. So the question reduces to UTF-8 vs. ISO-8859 (Windows
Codepages those one would consider are ISO 8859 charsets with
additional characters in the 128…159 region that is unused by the
ISO 8859 charsets.
Josef ‘Jupp’ SCHUGT
On Wed, Dec 17, 2003 at 02:31:32PM +0900, Yukihiro Matsumoto wrote:
In message “non-english characters” > > on 03/12/17, Daniel Bretoi lists@debonair.net writes:
–
http://oss.erdfunkstelle.de/ruby/ - German comp.lang.ruby-FAQ
http://rubyforge.org/users/jupp/ - Ruby projects at Rubyforge
...................................
Windows are best when they are “unseen” – Chet Noll 27 Oct 2000
hmm.
regexp works fine for me with unicode. either with “ruby -Ku” on
startup or with the /u as regexp-option.
but with ISO-8859-+ (1 or 15 in my case) i don’t get \w to match
accented characters.
no big deal, i’m just curious what i’m doing wrong here. i’m using
ruby-1.8.1 from debian testing.
On Wed, Dec 17, 2003 at 04:05:32PM +0900, Yukihiro Matsumoto wrote:
Hi,
In message “Re: non-english characters” > on 03/12/17, Daniel Bretoi lists@debonair.net writes:
I’m not sure, how can I find out what the germans use? and once I know
that part, how do I use it?Ask somebody around you to find out. Then if you’re going to use
Unicode (UTF-8), write your script in UTF-8 and invoke Ruby with -Ku
option. If you use ISO-8859-* or any other single byte encoding, you
don’t have to do anything special.matz.
Hi,
In message “Re: non-english characters” on 03/12/17, messju mohr messju@lammfellpuschen.de writes:
but with ISO-8859-+ (1 or 15 in my case) i don’t get \w to match
accented characters.
That’s restriction, character class is defined as [a-zA-Z_].
This restriction will be removed in the Ruby 1.9 by using ISO-8859-*
specific encoding.
matz.
“messju mohr” messju@lammfellpuschen.de schrieb im Newsbeitrag
news:20031217082712.GE17320@pharao.lammfellpuschen.de…
On Wed, Dec 17, 2003 at 04:05:32PM +0900, Yukihiro Matsumoto wrote:
Hi,
In message “Re: non-english characters” > > on 03/12/17, Daniel Bretoi lists@debonair.net writes:
I’m not sure, how can I find out what the germans use? and once I know
that part, how do I use it?Ask somebody around you to find out. Then if you’re going to use
Unicode (UTF-8), write your script in UTF-8 and invoke Ruby with -Ku
option. If you use ISO-8859-* or any other single byte encoding, you
don’t have to do anything special.matz.
hmm.
regexp works fine for me with unicode. either with “ruby -Ku” on
startup or with the /u as regexp-option.but with ISO-8859-+ (1 or 15 in my case) i don’t get \w to match
accented characters.
I guess \w is defined in terms of ASCII - and there you don’t have “ß”, “é”
and similar chars.
Regards
robert
“messju mohr” messju@lammfellpuschen.de schrieb im Newsbeitrag
news:20031217082712.GE17320@pharao.lammfellpuschen.de…Hi,
I’m not sure, how can I find out what the germans use? and once I know
that part, how do I use it?Ask somebody around you to find out. Then if you’re going to use
Unicode (UTF-8), write your script in UTF-8 and invoke Ruby with -Ku
option. If you use ISO-8859-* or any other single byte encoding, you
don’t have to do anything special.matz.
hmm.
regexp works fine for me with unicode. either with “ruby -Ku” on
startup or with the /u as regexp-option.but with ISO-8859-+ (1 or 15 in my case) i don’t get \w to match
accented characters.I guess \w is defined in terms of ASCII - and there you don’t have “ß”, “é”
and similar chars.
yes, it looks like i got confused by the PCRE library which treats \w
according to the current locale. too-many-languages error.
On Wed, Dec 17, 2003 at 06:12:00PM +0900, Robert Klemme wrote:
On Wed, Dec 17, 2003 at 04:05:32PM +0900, Yukihiro Matsumoto wrote:
In message “Re: non-english characters” > > > on 03/12/17, Daniel Bretoi lists@debonair.net writes:
Regards
robert
depends on your definition of ‘treats’ and ‘locale’
-bash-2.05b$ cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon)
-bash-2.05b$ perl -v | head -2 # why so much output!
This is perl, v5.8.0 built for i386-linux-thread-multi
-bash-2.05b$ ruby -v
ruby 1.6.8 (2002-12-24) [i386-linux-gnu]
-bash-2.05b$ export LANG=en_US.UTF-8
-bash-2.05b$ echo abc | perl -ne ‘print if /[^\s]+/’
-bash-2.05b$ echo abc | ruby -ne ‘print if /[^\s]+/’
abc
-bash-2.05b$ export LANG=en_US
-bash-2.05b$ echo abc | perl -ne ‘print if /[^\s]+/’
abc
-bash-2.05b$ echo abc | ruby -ne ‘print if /[^\s]+/’
abc
definitely need to examine output carefully where regexes and locale are in
effect - probably better off using ruby since matz presumably has more
experience with multibyte chars than 'ol larry!
-a
On Wed, 17 Dec 2003, messju mohr wrote:
yes, it looks like i got confused by the PCRE library which treats \w
according to the current locale. too-many-languages error.
–
ATTN: please update your address books with address below!
===============================================================================
EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
STP :: http://www.ngdc.noaa.gov/stp/
NGDC :: http://www.ngdc.noaa.gov/
NESDIS :: http://www.nesdis.noaa.gov/
NOAA :: http://www.noaa.gov/
US DOC :: http://www.commerce.gov/The difference between art and science is that science is what we
understand well enough to explain to a computer.
Art is everything else.
– Donald Knuth, “Discover”/bin/sh -c ‘for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done’
===============================================================================
yes, it looks like i got confused by the PCRE library which treats \w
according to the current locale. too-many-languages error.depends on your definition of ‘treats’ and ‘locale’
-bash-2.05b$ cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon)-bash-2.05b$ perl -v | head -2 # why so much output!
This is perl, v5.8.0 built for i386-linux-thread-multi
-bash-2.05b$ ruby -v
ruby 1.6.8 (2002-12-24) [i386-linux-gnu]BROKEN “TREATMENT” OF LOCALE
-bash-2.05b$ export LANG=en_US.UTF-8
-bash-2.05b$ echo abc | perl -ne ‘print if /[^\s]+/’
-bash-2.05b$ echo abc | ruby -ne ‘print if /[^\s]+/’
abcTHIS IS OK
-bash-2.05b$ export LANG=en_US
-bash-2.05b$ echo abc | perl -ne ‘print if /[^\s]+/’
abc
-bash-2.05b$ echo abc | ruby -ne ‘print if /[^\s]+/’
abcdefinitely need to examine output carefully where regexes and locale are in
effect - probably better off using ruby since matz presumably has more
experience with multibyte chars than 'ol larry!
i was talking about ISO-8859-* charactersets and already said, that
UTF-8 works for me.
your example works fine for me with
“This is perl, v5.8.2 built for i386-linux-thread-multi” (from
debian unstable)
i meant the PCRE library from
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/ . it’s meant
to be perl compatible but it is not the actual implementation in
the perl-interpreter, AFAIK
no need to convince me to use ruby over perl
greetings
messju
On Thu, Dec 18, 2003 at 02:36:58AM +0900, Ara.T.Howard wrote:
On Wed, 17 Dec 2003, messju mohr wrote:
-a