logo Ircam

Singing Voice Separation using the Normalized Cuts

Personnes Associées: Mathieu Lagrange


The singing voice and melody are important characteristics of music signals. In this study, we propose a method for extracting the singing voice and corresponding melody from ``real-world'' polyphonic music. The proposed method is inspired by ideas from Computational Auditory Scene Analysis. We formulate singing voice tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in Computer Vision. Sinusoidal modeling is used as the underlying representation.

  • A novel harmonicity cue which we term Harmonically Wrapped Peak Similarity (HWPS) is introduced. Some insights about this cue are provided here.
  • The following figure show results for automatic melody extraction using the proposed approach.
  • Experimental results supporting the use of this method are presented here.
  • The implementation of the method is licensed under the GPL and freely available from the sourceForge repository of Marsyas. Instructions can be found here.