Replacing diacritics by simple character

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

···

--
Une Bévue

(Hello again... :slight_smile: )

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

IConv can do that for you :

require "iconv"

=> true

i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")

=> #<Iconv:0x84d4448>

i.iconv("aéouï Æ")

=> "a'eou"i AE"

i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')

=> "aeoui AE"

Fred

···

Le 25 septembre à 18:25, Une Bévue a écrit :
--
I've found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker
brings you said instrument. Suddenly, no more paper jams.
                                             (Kai Henningsen in the SDM)

--
I've found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker

--------------------------------------------------------------------------------------^
???

brings you said instrument. Suddenly, no more paper jams.
                                             (Kai Henningsen in the SDM)

:smiley:

Fine thanks a lot Fred à c't'heure :wink:

Have a good wine celler :wink:

ça marche même avec de l'UTF-8

works also with UTF-8

···

F. Senault <fred@lacave.net> wrote:

IConv can do that for you :

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")
=> #<Iconv:0x84d4448>
>> i.iconv("aéouï Æ")
=> "a'eou"i AE"
>> i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
=> "aeoui AE"

--
Une Bévue

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

IConv can do that for you :

An alternative approach is something like Sean M. Burke's Text::Unidecode:

http://interglacial.com/~sburke/tpj/as_html/tpj22.html

Here is an example of an implementation of Unidecode in Lua [1]:

local Unidecode = require( 'Unidecode' )

print( Unidecode( 'Москва́' ) )
print( Unidecode( '北京' ) )
print( Unidecode( 'Ἀθηνᾶ' ) )
print( Unidecode( '서울' ) )
print( Unidecode( '東京' ) )
print( Unidecode( '京都市' ) )
print( Unidecode( 'नेपाल' ) )
print( Unidecode( 'תֵּל־אָבִיב-יָפוֹ' ) )
print( Unidecode( 'تَلْ أَبِيبْ يَافَا' ) )
print( Unidecode( 'تهران' ) )
print( Unidecode( 'Géometrie Différentielle' ) )

> Moskva
> beijing
> Athena
> seoul
> dongjing
> jingdushi
> nepaal
> te'labiyb-yapvo
> tal 'abiyb yaafaa
> thran
> Geometrie Differentielle

Cheers,

PA.

[1] http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua

···

On Sep 25, 2007, at 18:55, F. Senault wrote:

F. Senault wrote:

IConv can do that for you :

require "iconv"

=> true

i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")

=> #<Iconv:0x84d4448>

i.iconv("aéouï Æ")

=> "a'eou"i AE"

i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')

=> "aeoui AE"

That doesn't work on all platforms. For me:

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
=> #<Iconv:0xb7cf28e0>
>> i.iconv("aéouï Æ")
=> "a?ou? AE"

:frowning:

It's intentional. Cow orker was probably a typo in the olden times, but
has entered the mainstream since then. Just ask google : "Results 1 -
10 of about 37,200 for "cow orker". (0.19 seconds)" :slight_smile:

Fred

···

Le 25 septembre à 20:12, Michal Suchanek a écrit :

--
I've found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker

--------------------------------------------------------------------------------------^
???

--
I feel it move across my skin. I'm reaching up and reaching out, I'm
reaching for the random or what ever will bewilder me. And following
our will and wind we may just go where no one's been. We'll ride the
spiral to the end and may just go where no one's been. (Tool, Lateralus)

Are u sure about the encoding of "aéouï Æ" ?

because i did it with UTF-8, it works :

-- the script ----------------------------------------------------------
#! /usr/bin/env ruby

require "iconv"

i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

p i.iconv("aéouï Æ")
# => "a'eou\"i AE"

p i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
# => "aeoui AE"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß du
?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß
du?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"

···

Daniel DeLorme <dan-ml@dan42.com> wrote:

That doesn't work on all platforms. For me:

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
=> #<Iconv:0xb7cf28e0>
>> i.iconv("aéouï Æ")
=> "a?ou? AE"

:frowning:

------------------------------------------------------------------------
--
Une Bévue

How do i get off this mailing list ? THANKS!!!

···

On 9/25/07, F. Senault <fred@lacave.net> wrote:

Le 25 septembre à 20:12, Michal Suchanek a écrit :

>> --
>> I've found an axe can do a lot for a paper-mangling printer. Especially
>> if you shout for one at the top of your voice, and then a cow orker
>
--------------------------------------------------------------------------------------^
> ???

It's intentional. Cow orker was probably a typo in the olden times, but
has entered the mainstream since then. Just ask google : "Results 1 -
10 of about 37,200 for "cow orker". (0.19 seconds)" :slight_smile:

Fred
--
I feel it move across my skin. I'm reaching up and reaching out, I'm
reaching for the random or what ever will bewilder me. And following
our will and wind we may just go where no one's been. We'll ride the
spiral to the end and may just go where no one's been. (Tool, Lateralus)

Une Bévue wrote:

That doesn't work on all platforms. For me:

>> require "iconv"
=> true
>> i = Iconv.new("ASCII//TRANSLIT", "UTF-8")
=> #<Iconv:0xb7cf28e0>
>> i.iconv("aéouï Æ")
=> "a?ou? AE"

:frowning:

Are u sure about the encoding of "aéouï Æ" ?

yep.

>> str = "aéouï Æ"
=> "a\303\251ou\303\257 \303\206" #(that's utf8 allright)
>> i.iconv(str)
=> "a?ou? AE"

but like I said, translit doesn't work the same on all platforms (I'm on ubuntu btw)

Daniel

···

Daniel DeLorme <dan-ml@dan42.com> wrote:

i'm running Mac OS X 10.4.10...

···

Daniel DeLorme <dan-ml@dan42.com> wrote:

but like I said, translit doesn't work the same on all platforms (I'm on
ubuntu btw)

--
Une Bévue