STRING MANIPULATION: Marking syllables

Hello,
How might one express the following syllabification rules:

1. A c(c)v sequence is a syllable if followed by a cv sequence

pe.so (the dot indicates a syllable break)
pra.da

2. A c(c)vc sequence is a syllable if followed by a c

fal.do
bran.di

where:
c = consonant
v = vowel
(c) = optional consonant

Thank you for your help.
basi

How might one express the following syllabification rules:
1. A c(c)v sequence is a syllable if followed by a cv sequence

[...]

2. A c(c)vc sequence is a syllable if followed by a c

There are a couple of syntax (human language syntax) libraries
mentioned on RAA; don't remember the names offhand. For hyphenation
(not *quite* the same, mind you), you can always use Text::Hyphen.

With a regexp, I'd do those as:

    # Note: this counts y as both vowel and consonant. This may not
    # always result in correct syllable identification.
  VOWELS = V = %r{[aieouy]}i
  CONSONANTS = C = %r{[b-df-hj-np-tv-z]}i
  c_opc_v = %r{#{C}#{C}?#{V}}
  c_opc_v_c = %r{#{C}#{C}?#{V}#{C}}

There are other rules, I'm sure, because these two rules could be,
at least theoretically, converted into c(c)v(c).

-austin

···

On 8/6/05, basi <basi_lio@hotmail.com> wrote:
--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

basi wrote:

How might one express the following syllabification rules:

1. A c(c)v sequence is a syllable if followed by a cv sequence

pe.so (the dot indicates a syllable break)
pra.da

2. A c(c)vc sequence is a syllable if followed by a c

fal.do
bran.di

where:
c = consonant
v = vowel
(c) = optional consonant

CC = {}
CC['v'] = '[aeiou]'
CC['c'] = '[^aeiou]'

class String
  # Convert rule like "c(c)vc.c" to a regular expression.
  def to_syllrule
    re = "^("
    self.scan( /\(?([cv.])(\)?)/ ) { |x|
      if "." == x[0]
        re << ")("
      else
        re << CC[ x[0] ]
        re << "?" if ")" == x[1]
      end
    }
    Regexp.new( re + ".*)" )
  end
end

# Make a list of the rules as regular expressions.
rules = %w( c(c)v.cv c(c)vc.c ).inject(){|a,s| a<< s.to_syllrule }

%w( peso prada faldo brandi ).each { |word|
  rules.each { |re|
    if word =~ re
      puts $~.captures.join('.')
      break
    end
  }
}

···

-----------------
Output:

pe.so
pra.da
fal.do
bran.di

Note that your CC['c'] will catch 0-9 and punctuation as well.

-austin

···

On 8/6/05, William James <w_a_x_man@yahoo.com> wrote:

CC = {}
CC['v'] = '[aeiou]'
CC['c'] = '[^aeiou]'

--
Austin Ziegler * halostatue@gmail.com
               * Alternate: austin@halostatue.ca

Austin Ziegler wrote:

There are other rules, I'm sure, because these two rules could be,
at least theoretically, converted into c(c)v(c).

After making the following change

rules = %w( c(c)v(c). ).inject(){|a,s| a<< s.to_syllrule }

the output becomes

pes.o
prad.a
fal.do
bran.di

Hello,
Thank you very much for your help. This is great!
basi

Hi,
Yes, I did mean to inquire for references to human language parsing
libraries, but forgot in my initial email. Thank you for pointing me to
RAA. I will visit it right away.
basi