Query by Singing/Humming

QBSH (Query by Singing/Humming) is also an interesting and effective paradigms for music retrieval. Based on a singing/humming recording of 8 seconds, our QBSH system can identify the intended piece of music in 2 seconds or so, for a music database of 20K songs. Common applications scenarios of QBSH includes:

  • Singing/humming queries for karaoke or any similar applications that requires music retrieval.
  • Real-time singing scoring based on pitch accuracy, for karaoke or for music games/toys, etc.
  • Automatic transcription from singing or humming

The deployment of a QBSH system involves the following steps:

  1. Collect a set of music pieces for the database.
  2. Extract the main melody of these music pieces:
    1. For monophonic recordings, we can perform pitch tracking to compute the melody contours for comparison.
    2. For polyphonic recordings, we need to perform singing voice separation first before computing melody contours from the separated singing voices. (Note that pitch tracking over separated singing voices usually is not as stable as its counterpart on monophonic rendering of the main melody, such as human's solo.)
  3. Organize the extracted melody contours into a compack database for easy comparison.