I know this part is after the summary (Thanks for the nice writeup) ,
but I wanted to share.
I had an idea similar to Leslie's, but I wanted to actually write out
an audio file, instead of sending the narration to the speakers. The
solution has 2 parts.
class WaveRead extracts all the information from a wave file. I put
it together in under 2 hours last night. It was so much easier to
write than the one in I did C a few years ago, and I'm really pleased
the result. It's clean and extensible. I already have an idea for
making it trivial to add the other chunk definitions.
class WaveSpeaker writes a new wave file with everything it was told
to say. It does this by using a wave file feature called cues, which
are a way of marking a point in the file and giving it a name. I
created a wave file with several words, and a cue marking each one.
(see more about this below.) WaveSpeaker parses this file, and starts
writing a new output file with the same format. Then, when #say is
called, it looks for each word in the list of cues, and if found,
pastes the appropriate part of the source wave into the output file.
It inserts silence for each #wait, compensating for the length of the
previous sentences. At the end it just fixes up the filesize data,
and closes the file. All you need to do is convert the file to MP3
and transfer to your iPod.
------wavespeaker.rb------
require 'Ostruct'
class RiffRead
def initialize io
@io = io
raise "Not a RIFF file" if io.read(4) != "RIFF"
@size = get_long
@type = get_word
end
def parse
chunks =
chk = get_chunk
while chk
chunks << chk
chk = get_chunk
end
chunks
end
def self.get_long io
io.read(4).unpack('V')[0]
end
def self.get_short io
io.read(2).unpack('v')[0]
end
def self.get_word io
io.read(4)
end
private
def get_chunk
tag = get_word
return nil if !tag
if tag == 'LIST'
handle_list
else
size = get_long
size+=1 if size%2 != 0
data = handle_tag(tag,size)
data ||= @io.read(size)
[tag, size, data]
end
end
def handle_tag tag,size
funcname = "parse_"+tag.strip
if methods.include? funcname
return self.send(funcname, size)
end
end
def handle_list
listsize = get_long
@listtype = get_word
['LIST',listsize,@listtype]
end
def get_long
self.class::get_long @io
end
def get_short
self.class::get_short @io
end
def get_word
self.class::get_word @io
end
end
def make_cue io
cue = OpenStruct.new
cue.name = RiffRead::get_long io
cue.position = RiffRead::get_long io
cue.chkname = RiffRead::get_word io
cue.chkstart = RiffRead::get_long io
cue.blockkstart = RiffRead::get_long io
cue.samplestart = RiffRead::get_long io
cue
end
class WaveRead < RiffRead
attr_reader :cues,:labels,:format, :data
def initialize io
super
raise "Not a Wave File" if @type != 'WAVE'
end
def parse_fmt size
@format = OpenStruct.new
@format.data = @io.read(size)
@format.size = size
@format.tag = format.data[0,2].unpack('v')[0]
@format.channels = format.data[2,2].unpack('v')[0]
@format.samples_per_sec = format.data[4,4].unpack('V')[0]
@format.bytes_per_sec = format.data[8,4].unpack('V')[0]
@format.blockAlign = format.data[12,2].unpack('v')[0]
@format
end
def parse_data size
@data = @io.read(size)
end
def parse_cue size
@cues =
numcues = get_long
numcues.times do
@cues << make_cue(@io)
end
@cues
end
def parse_labl size
id = get_long
string = @io.read(size-4)
@labels||=
@labels << [id,string.strip]
@labels.last
end
def parse_note size
id = get_long
string = @io.read(size-4)
@notes||=
@notes << [id,string.strip]
@notes.last
end
end
class WaveSpeaker
def initialize filename
File.open(filename, "rb") do |f|
@data = WaveRead.new(f)
@data.parse
end
@elapsed = 0
end
def begin outfile
@out = File.open(outfile, "wb")
@out.write('RIFF')
@filesize_marker = @out.pos
@out.write [0].pack('V')
@written = @out.write('WAVEfmt ')
@written+= @out.write [@data.format.size].pack('V')
@written+= @out.write @data.format.data
@written+= @out.write('data')
@datasize_marker = @out.pos
@written+= @out.write [0].pack('V')
end
def say string
fixup(string).split.each do |str|
str = fixup(str)
if str == 'COMMA'
wait 0.2
else
cue_id = nil
@data.labels.each_with_index{|label,i|
if label[1].downcase == str.downcase
cue_id = i
break
end
}
if cue_id
#p "saying #{str}"
start = @data.cues[cue_id].samplestart*2
endpt = @data.cues[cue_id+1].samplestart*2
endpt+=1 if (endpt-start)%2 != 0
@written+= @out.write(@data.data[start...endpt])
@elapsed += (endpt-start).to_f / @data.format.bytes_per_sec
else
p "CAN'T FIND <#{str}>"
end
end
end
end
def wait seconds
a = "\0"
delay = (seconds - @elapsed)
p delay
if delay > 0
bytes = (delay * @data.format.bytes_per_sec).to_i
p "wait #{bytes}"
bytes+=1 if (bytes%2 != 0)
silence = a*bytes
@written+= @out.write silence
@elapsed = 0
else
@elapsed -= seconds
end
end
def fixup str
#remove punctuation, mark pauses
str.gsub!(/,/," COMMA ")
str.gsub!(/[^\w\s]/,"")
str
end
def quit
@out.seek @filesize_marker
@out.write [@written].pack('V')
@out.seek @datasize_marker
@out.write [@written-@datasize_marker+4].pack('V')
@out.close
p @written
end
end
if __FILE__ == $0
wr = WaveSpeaker.new("coach.wav")
wr.begin("todays_run.wav")
wr.say 'run 60 seconds'
wr.wait 1
wr.say 'walk 15 minutes'
wr.quit
end
-----end-----
To get to work with my solution, just add the following lines:
in Coach#initialize, add
@speaker = WaveSpeaker.new "coach.wav"
@speaker.begin "current_workout.wav"
at the end of Coach#coach add
@speaker.quit
and replace these two functions:
def say s
@speaker.say s
end
def wait n
@speaker.wait n
@target_time -= n
end
To get the source file, I generated a wave file with 53 words from my
coaching script using a synth (couldn't find a microphone), and used
my wave editor's auto cue feature to insert numbered cues in all the
gaps between words. After running simple script to replace the
numbers with the words, I have a complete solution that produces a 20
minute long wav file of a robot coach. It would probably be better if
you used a real voice. If anyone is actually interested in this, I
can give you more details on the wave file creation.
-Adam
···
On 6/14/06, Leslie Viljoen <leslieviljoen@gmail.com> wrote:
Here's a quick version that is closer to having speech synth. It's not
a real synthesiser, but if you can provide the corresponding ogg files
it can look for certain phrases and play them. The result should sound
a bit better than a real synthesiser since the sections will be spoken
fairly naturally. Only 53 files to record!