Legacy: ASY: Synthesis Process

Speech synthesis in ASY can be produced for either static vocal tract shapes or for dynamic utterances. In the latter case, two tables are used to control the synthesis process. The first table (script) provides a specification of key tract shapes. Each row of the script table specifies 10 variables that fully describe the midsagittal vocal tract shape. Thus, each row of the script table corresponds to a static picture of a schematic vocal tract (as if viewed from the side).

The second table (control) provides timing and voice source information. The control table "directs" the synthesis process, telling the system how rapidly to move from shape to shape. If succeeding rows in script tables represent different tract shapes, the result will be that simulated movement is created. Depending upon the time values specified in the control table, intermediate vocal tract shapes may be automatically created by linear interpolation between script table values. This process is similar to key frame animation, where the animator creates key frames, and the intermediate pictures are interpolated. Another way to think of this is as "morphing" between shapes of the vocal tract.

Synthesis is on a pitch-pulse by pitch-pulse basis. For each pitch pulse, a short snippet of sound is created digitally; as the next pitch pulse is generated, the sound is added to the end of the previous snippet, resulting in a continuous utterance. The duration of a pulse is determined by the fundamental frequency specified in the control table. The higher the frequency, the shorter the duration of a pitch pulse.

Once a sound has been synthesized, it can be listened to ("played") or saved in a computer file for subsequent listening, analysis, or editing.