Singing Voice Separation using the Normalized Cuts

Singing Voice Separation using the Normalized Cuts

Personnes Associées: Mathieu Lagrange

The singing voice and melody are important characteristics of music signals. In this study, we propose a method for extracting the singing voice and corresponding melody from ``real-world'' polyphonic music. The proposed method is inspired by ideas from Computational Auditory Scene Analysis. We formulate singing voice tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in Computer Vision. Sinusoidal modeling is used as the underlying representation.

A novel harmonicity cue which we term Harmonically Wrapped Peak Similarity (HWPS) is introduced. Some insights about this cue are provided here.
The following figure show results for automatic melody extraction using the proposed approach.
Experimental results supporting the use of this method are presented here.
The implementation of the method is licensed under the GPL and freely available from the sourceForge repository of Marsyas. Instructions can be found here.