"A" and "an" articles in front of words

Likely you'll have to use a dictionary of words that fit one or the
other requirments. Then you would just check whether that dictionary
contains the word being evaluated and then respond accordingly.

···

On 11/08/2011 03:23 AM, Gonçalo C. Justino wrote:

Indeed. My understanding is that the usage of a/an depends on the
pronunciation of the next word. In the case of unicorn, it sounds like
it begins with a "you", hence "a unicorn"

on the flip side, consider the phrase "I'll be there in an hour",
silent 'h' so we use "an hour"

true. vowels following a mute h should be included should be included
together with the (single) vowels.

btw, out of curiosity, how can one distinguish in code a short vowel (like
the u in umbrella) from a long you-nicorn-like vowel ?

--
Darryl L. Pierce <mcpierce@gmail.com>
http://mcpierce.multiply.com/
"What do you care what people think, Mr. Feynman?"

You're right about unicorn and eulogy. I'm interested in checking out
the correlation between second-and-third letters and vowels that become
consonants in pronunciation now, to see how strong a correlation that is.
I'm pretty sure there are exceptions to these perceived rules, though, in
any case.

It seems likely that, most often, you'd get the following results, where
V means "vowel" and C means "consonant". Lower case letters are
literals. In each case, two adjacent vowels are assumed to be
*different* vowels.

    uCC: treat as vowel
    uCV: treat as consonant
    VVC: treat as consonant
    yC: treat as vowel
    yV: treat as consonant

These are only my immediate impressions, so far. Assuming for argument's
sake that they're correct for the general case, though, there would
almost certainly be exceptions for every one of these correlations, and
the question that arises then is whether the exceptions are rare enough
to warrant using these correlations as rules with a set of exceptions
used to override them, or numerous enough for it to make more sense to
just use an extensive dictionary to handle such matters.

If I get really bored, I may put together a really extensive dictionary
to cover this, then use it to determine the strength of such
correlations some day (or week or month), but not today.

···

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonçalo C. Justino wrote:

does different pronunciation comes from the subsequent letters ? i'm
thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but uNIcorn,
eULogy (or is this "an eulogy"? now i'm confused)... i'm wondering if two
consonants make it "an" and at least one vowel make in "a". Maybe I'm just
ramblingm, this sounds so un-rubyesque :S

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

google hasn't helped: does anyone have or know of a "complete" list of
english words ?

···

On 8 November 2011 17:20, Chad Perrin <code@apotheon.net> wrote:

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonçalo C. Justino wrote:
>
> does different pronunciation comes from the subsequent letters ? i'm
> thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but uNIcorn,
> eULogy (or is this "an eulogy"? now i'm confused)... i'm wondering if two
> consonants make it "an" and at least one vowel make in "a". Maybe I'm
just
> ramblingm, this sounds so un-rubyesque :S

You're right about unicorn and eulogy. I'm interested in checking out
the correlation between second-and-third letters and vowels that become
consonants in pronunciation now, to see how strong a correlation that is.
I'm pretty sure there are exceptions to these perceived rules, though, in
any case.

It seems likely that, most often, you'd get the following results, where
V means "vowel" and C means "consonant". Lower case letters are
literals. In each case, two adjacent vowels are assumed to be
*different* vowels.

   uCC: treat as vowel
   uCV: treat as consonant
   VVC: treat as consonant
   yC: treat as vowel
   yV: treat as consonant

These are only my immediate impressions, so far. Assuming for argument's
sake that they're correct for the general case, though, there would
almost certainly be exceptions for every one of these correlations, and
the question that arises then is whether the exceptions are rare enough
to warrant using these correlations as rules with a set of exceptions
used to override them, or numerous enough for it to make more sense to
just use an extensive dictionary to handle such matters.

If I get really bored, I may put together a really extensive dictionary
to cover this, then use it to determine the strength of such
correlations some day (or week or month), but not today.

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

A complete dictionary shouldn't be necessary. Just exceptions. Look at how Rails handles pluralization. You can use the algorithm:

- if work starts with consonent, use "a"
- if word matches entry in exception list, use designated article
- else use "an" if it's a for-certain vowel ['a', 'e', 'i', 'o', 'u']

This way, you only do a lookup for words starting with possible vowels ['a', 'e', 'i', 'o', 'u', 'y', 'h']

You might even extend the consonant searching algorithm to use some heuristics as suggested the email below:

'a' if word =~ /^[-aieouyh]/ || word =~ /^u[-aieouyh] || word =~ /^y[-aieouyh]/

The problem is that the choice between 'a' and 'an' has to do with the way the word *sounds* in a given English (i.e., American, British). It is unlikely you will capture all the cases with a dictionary, hence the suggestion that the algorithm use a set of commonly encountered exceptions, accepting the fact that it will be incomplete and sometimes a bit embarrassing -- but no more so that the pronunciation of words by my nav. system's text to speech :slight_smile:

···

On Nov 9, 2011, at 5:10 AM, Gonçalo C. Justino wrote:

google hasn't helped: does anyone have or know of a "complete" list of
english words ?

On 8 November 2011 17:20, Chad Perrin <code@apotheon.net> wrote:

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonçalo C. Justino wrote:

does different pronunciation comes from the subsequent letters ? i'm
thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but uNIcorn,
eULogy (or is this "an eulogy"? now i'm confused)... i'm wondering if two
consonants make it "an" and at least one vowel make in "a". Maybe I'm

just

ramblingm, this sounds so un-rubyesque :S

You're right about unicorn and eulogy. I'm interested in checking out
the correlation between second-and-third letters and vowels that become
consonants in pronunciation now, to see how strong a correlation that is.
I'm pretty sure there are exceptions to these perceived rules, though, in
any case.

It seems likely that, most often, you'd get the following results, where
V means "vowel" and C means "consonant". Lower case letters are
literals. In each case, two adjacent vowels are assumed to be
*different* vowels.

  uCC: treat as vowel
  uCV: treat as consonant
  VVC: treat as consonant
  yC: treat as vowel
  yV: treat as consonant

These are only my immediate impressions, so far. Assuming for argument's
sake that they're correct for the general case, though, there would
almost certainly be exceptions for every one of these correlations, and
the question that arises then is whether the exceptions are rare enough
to warrant using these correlations as rules with a set of exceptions
used to override them, or numerous enough for it to make more sense to
just use an extensive dictionary to handle such matters.

If I get really bored, I may put together a really extensive dictionary
to cover this, then use it to determine the strength of such
correlations some day (or week or month), but not today.

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

What you want is the CMU Pronouncing Dictionary
[http://www.speech.cs.cmu.edu/cgi-bin/cmudict\]. You needn't include
the whole dictionary and read it in real time - just do a
preprocessing run to find words whose spelling starts with a vowel but
whose pronounciation starts with a consonant, and vice versa.

martin

···

2011/11/9 Gonçalo C. Justino <goncalo.justino@gmail.com>:

google hasn't helped: does anyone have or know of a "complete" list of
english words ?

On 8 November 2011 17:20, Chad Perrin <code@apotheon.net> wrote:

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonçalo C. Justino wrote:
>
> does different pronunciation comes from the subsequent letters ? i'm
> thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but uNIcorn,
> eULogy (or is this "an eulogy"? now i'm confused)... i'm wondering if two
> consonants make it "an" and at least one vowel make in "a". Maybe I'm
just
> ramblingm, this sounds so un-rubyesque :S

You're right about unicorn and eulogy. I'm interested in checking out
the correlation between second-and-third letters and vowels that become
consonants in pronunciation now, to see how strong a correlation that is.
I'm pretty sure there are exceptions to these perceived rules, though, in
any case.

It seems likely that, most often, you'd get the following results, where
V means "vowel" and C means "consonant". Lower case letters are
literals. In each case, two adjacent vowels are assumed to be
*different* vowels.

uCC: treat as vowel
uCV: treat as consonant
VVC: treat as consonant
yC: treat as vowel
yV: treat as consonant

These are only my immediate impressions, so far. Assuming for argument's
sake that they're correct for the general case, though, there would
almost certainly be exceptions for every one of these correlations, and
the question that arises then is whether the exceptions are rare enough
to warrant using these correlations as rules with a set of exceptions
used to override them, or numerous enough for it to make more sense to
just use an extensive dictionary to handle such matters.

If I get really bored, I may put together a really extensive dictionary
to cover this, then use it to determine the strength of such
correlations some day (or week or month), but not today.

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

These two statements are contradictory.

···

On Thu, Nov 10, 2011 at 02:36:13AM +0900, steve ross wrote:

- else use "an" if it's a for-certain vowel ['a', 'e', 'i', 'o', 'u']

This way, you only do a lookup for words starting with possible vowels ['a', 'e', 'i', 'o', 'u', 'y', 'h']

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

-----Messaggio originale-----

···

Da: Martin DeMello [mailto:martindemello@gmail.com]
Inviato: lunedì 14 novembre 2011 02:44
A: ruby-talk ML
Oggetto: Re: "A" and "an" articles in front of words

What you want is the CMU Pronouncing Dictionary
[http://www.speech.cs.cmu.edu/cgi-bin/cmudict\]. You needn't include the
whole dictionary and read it in real time - just do a preprocessing run to
find words whose spelling starts with a vowel but whose pronounciation
starts with a consonant, and vice versa.

martin

2011/11/9 Gonçalo C. Justino <goncalo.justino@gmail.com>:

google hasn't helped: does anyone have or know of a "complete" list of
english words ?

On 8 November 2011 17:20, Chad Perrin <code@apotheon.net> wrote:

On Tue, Nov 08, 2011 at 05:23:31PM +0900, Gonçalo C. Justino wrote:
>
> does different pronunciation comes from the subsequent letters ?
> i'm thinking uMBrella, uNCle, uRGengt, uNDer, uGLy, uPPer, uRGe but
> uNIcorn, eULogy (or is this "an eulogy"? now i'm confused)... i'm
> wondering if two consonants make it "an" and at least one vowel
> make in "a". Maybe I'm
just
> ramblingm, this sounds so un-rubyesque :S

You're right about unicorn and eulogy. I'm interested in checking
out the correlation between second-and-third letters and vowels that
become consonants in pronunciation now, to see how strong a correlation

that is.

I'm pretty sure there are exceptions to these perceived rules,
though, in any case.

It seems likely that, most often, you'd get the following results,
where V means "vowel" and C means "consonant". Lower case letters
are literals. In each case, two adjacent vowels are assumed to be
*different* vowels.

uCC: treat as vowel
uCV: treat as consonant
VVC: treat as consonant
yC: treat as vowel
yV: treat as consonant

These are only my immediate impressions, so far. Assuming for
argument's sake that they're correct for the general case, though,
there would almost certainly be exceptions for every one of these
correlations, and the question that arises then is whether the
exceptions are rare enough to warrant using these correlations as
rules with a set of exceptions used to override them, or numerous
enough for it to make more sense to just use an extensive dictionary to

handle such matters.

If I get really bored, I may put together a really extensive
dictionary to cover this, then use it to determine the strength of
such correlations some day (or week or month), but not today.

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org
]

--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Riccione Hotel 3 stelle in centro: Pacchetto Capodanno mezza pensione, animazione bimbi, zona relax, parcheggio. Scopri l'offerta solo per oggi...
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid982&d)-12

Right. I got the order backwards.

- Use "an" if it's not a "disputable" vowel %w(u y h)
- else do a lookup

Better?

···

On Nov 9, 2011, at 11:52 AM, Chad Perrin wrote:

On Thu, Nov 10, 2011 at 02:36:13AM +0900, steve ross wrote:

- else use "an" if it's a for-certain vowel ['a', 'e', 'i', 'o', 'u']

This way, you only do a lookup for words starting with possible vowels ['a', 'e', 'i', 'o', 'u', 'y', 'h']

These two statements are contradictory.

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]

Yes -- apart from the fact that "e" at least might fall into a consonant
niche under certain circumstances.

···

On Thu, Nov 10, 2011 at 05:55:57AM +0900, steve ross wrote:

Right. I got the order backwards.

- Use "an" if it's not a "disputable" vowel %w(u y h)
- else do a lookup

Better?

--
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]