logo Ircam

Expressive vibrato synthesis for singing

Principal Research Theme: Sound Synthesis and Treatment, Signal Transformation, Voice
Associated People: Axel Roebel
Associated Software: SuperVP
Date of Activity: 
13/10/2011

This demo presents our results of conversion of a speech signal into singing using the shape invariant phase vocoder presented in Roebel10b for signal transformation. The speech to singing transformation is based on semi automatic labeling of speech phonemes where each vowel is segmented into three sections (start, central an end). The central segments of the vowels are used to create the note material, and therefore, they have to be time stretched significantly to obtain the targeted note durations. After time stretching these segments the resulting notes are somewhat static, they do not contain any expressive modulation. A solution for this problem was to add vibrato to these segments. The typical vibrato has maximum extent of 50cents and vibrato rate of 5Hz. The vibrato extent follows a sinusoidal contour covering exactly one half of a period of the sinusoid. To improve over this existing approach a means to add an evolution of the glottal pulse shape Degottex10c was intended, such that the gesture controling the vibrato extent is related to a corresponding gesture of the glottal source.

Original text:

Conversion to singing with preservation of spectral envelope (no glottal pulse model is used):

Conversion to singing with preservation of vocal tract filter (VTF) and glottal puls shape:

Conversion to singing with preservation of VTF and modification of glottal puls shape (reduce rd parameter by up to 0.6 using the vibrato extend curve as control curve):

Conversion to singing with preservation of VTF and modification of glottal puls shape (reduce rd parameter by up to 1 using the vibrato extend curve as control curve):

 

Back to Axel Roebel's main page.