Good online text to speech demo.
Review of Speech Synthesis Technology
For example, if the technology is used to record and synthesize the voices of cartoon characters or company presidents, the method will allow users to have their favorite sentences or lines read back naturally using those voices simply by inputting such sentences or lines on a PC.
Read more about Speech Synthesis: , , , ,
Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion (phoneme is the term used by linguists to describe distinctive sounds in a language). The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program. Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary. The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. This is similar to the "sounding out", or synthetic phonics, approach to learning reading.
("Welcome to the Stockholm Speech Comm.
Fujitsu's new technology extracts voice characteristics such as voice quality, intonation, and timing from recorded voices, converts them into parameters, and synthesizes speech using these parameters. For example, if a greater sense of urgency should be added, speech reflecting such a need can easily be synthesized by adjusting the relevant parameters.
Risberg, "OVE II synthesis strategy", Proc Speech Comm.
Conveying the tone of speech and nuances of words has become possible because Fujitsu developers wanted to make this technology more useful for the world. To develop previous speech synthesis technologies, huge numbers of sample sentences were read by narrators and recorded to create a basic data set. These sample sentences were then strung together as required to synthesize speech. Preparing such large amounts of sample data required much time and labor.
18. The HMM-based Speech Synthesis System,
Each approach has advantages and drawbacks. The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary. As dictionary size grows, so too does the memory space requirements of the synthesis system. On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations. (Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].) As a result, nearly all speech synthesis systems use a combination of these approaches.