At&t labs german text-to-speech synthesis, espeak: speech synthesis
Start from any position on the text. The engines that synthesize the speech DSP-component are based mainly on five main technologies: Choose the speech rate to slow down or speed up the voice.
Just type a word or a phrase, essay ucla copy-paste any text. It's kind of out-dated, as non-uniform unit selection is not explicitely mentioned.
There are many spellings in English which are pronounced differently based on context. Lucero and colleagues, incorporate models of vocal fold biomechanics, glottal aerodynamics and acoustic wave propagation in the bronqui, traquea, nasal and oral cavities, and thus constitute full systems of physics-based speech simulation.
On the other hand, the rule-based approach works on any input, but the complexity of the syntheses grows substantially as the system takes into account irregular spellings or pronunciations. Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and borrowingswhose pronunciations are not obvious from their spellings.
For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".
Typically, the division into segments is done using a specially modified speech recognizer set to a "forced alignment" lab with some german correction afterward, using visual representations such as the waveform and spectrogram.
The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings. Each approach has advantages and drawbacks. Different organizations often use different speech data.