Introduction
At this stage, I strongly, strongly, strongly, highly do not recommend using loud accompaniment, fast-paced, female voice, high-pitched singers especially when these debuffs are stacked together, to train models for voice separation. This is completely torturing oneself and will not yield good results.
I have tried four times already, don't try again
Additional Information
2023-09-03 I tried batch conversion of some songs and found that there were a lot of problems with the accompaniment vocals mixing, muffled sound, and noise in the Kim_vocal_2 vocal part. The accompaniment, on the other hand, was perfect. I suspect this model is imitating me. Most of the songs, except for the ones tested below, are not suitable, which is a waste of time.
Introduction
This article is time-sensitive and may not be updated in the future
Since I have trained the RVC model before, but the results were not very good, although the demucs v4 model has better separation results, there is a chance that the vocals will sound muffled.
This article will not use paid products for testing. Although the effect of the Dango AI is really good, it can perfectly separate even disaster-level songs
I didn't realize that RVC was updated more than ten days ago, and the updated pitch extraction algorithm in the new version reduces the occurrence of broken sound. I thought I didn't need it anymore, so I deleted the model and had to train it again
Testing
The results of this test are based on human auditory perception. Different songs and parameters may yield different results. Corrections are welcome in the comments
Note!!! The female voice used in this article may not necessarily be a female singer 😹, and the same goes for male voices 😹
The maximum score is 100, but the score cannot be perfect because there will always be differences between the separated vocals and the original recording
Some music cannot be found, so I cannot provide it for fear of DMCA
For vocal separation without accompaniment, most models have good results except for the old model, so this article will not test them anymore.
Several well-received vocal models found online will be used for testing.
MDX23 parameters refer to the MVSep leaderboard.
There will always be some accompaniment sound when there is no vocals, which can be trimmed. This issue will not be mentioned further
Run on Colab
The computer cannot handle it and will run out of memory. Running on Colab requires 13.8GB of memory.
BigShifts_MDX = 21
overlap_MDX = 0
overlap_MDXv3 = 20
weight_MDXv3 = 6
weight_VOCFT = 5
weight_HQ3 = 2
overlap_demucs = 0.8
output_format = 'FLOAT'
vocals_instru_only = True
if vocals_instru_only:
vocals_only = '--vocals_only true'
else:
vocals_only = ''
chunk_size = 1000000
Other models use default UVR parameters.
Test 1 (Disaster with full debuffs)
I have no hope anymore
Test audio, high BPM, long duration of clipping, accompaniment volume larger than vocals at times, vocals and accompaniment completely mixed together, maybe the music quality is a bit low. The song costs 204 yen on Ringo Music (didn't buy it). All debuffs are stacked together
Interestingly, in this song, the vocals and accompaniment mixed together sound fine, but after separation, the vocals have a low sound quality issue, while the accompaniment is unaffected. This may be because the accompaniment volume is too loud.
Model | Score | Comments |
---|---|---|
RipX built-in software | 50 | Mixed with instrumental sound, vocals sound muffled |
MDX23 | 70 | Slightly mixed with instrumental sound, occasional noise when accompaniment volume is high |
htdemucs_ft | 40 | Mixed with accompaniment sound, vocals sound muffled, noise |
Kim_vocal_2 | 65 | Vocals sound significantly muffled, noise |
4_HP_Vocal_UVR | 35 | Mixed with loud accompaniment sound, vocals sound muffled, noise |
Test 2 (Light accompaniment with one main instrument)
Female vocals
The main instrument is a guitar, and the separation results are already good in RipX.
Model | Score | Comments |
---|---|---|
RipX built-in software | 80 | Mixed with instrumental sound |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 90 | Slightly mixed with instrumental sound |
Kim_vocal_2 | 85 | Occasionally slightly mixed with instrumental sound |
4_HP_Vocal_UVR | 85 | Mixed with instrumental sound |
Male vocals
The same song is used, but there are slight differences in the accompaniment. Hilariously, I can't find the same one Although I don't know why the BPM became so high
Model | Score | Comments |
---|---|---|
RipX built-in software | 95 | Almost perfect |
MDX23 | 98 | Almost perfect |
htdemucs_ft | 93 | Partially mixed with instrumental sound |
Kim_vocal_2 | 97 | Almost perfect |
4_HP_Vocal_UVR | 96 | Almost perfect |
Test 3 (Pop music)
Different versions of "Kokoro Zashi" and "Renai Saiban"
The accompaniment for "Renai Saiban" is the same.
Different songs may have different singers 🤔
Female vocals
Kokoro Zashi
Model | Score | Comments |
---|---|---|
RipX built-in software | 80 | Some vocals sound muffled |
MDX23 | 95 | Almost perfect |
htdemucs_ft | 95 | Almost perfect |
Kim_vocal_2 | 95 | Almost perfect |
4_HP_Vocal_UVR | 85 | Slightly mixed with instrumental sound |
Renai Saiban
Model | Score | Comments |
---|---|---|
RipX built-in software | 75 | Slightly mixed with instrumental sound, slight noise |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 80 | Mixed vocals and accompaniment |
Kim_vocal_2 | 90 | Slightly mixed with instrumental sound |
4_HP_Vocal_UVR | 85 | Slightly mixed with instrumental sound, slight noise |
Male vocals
Kokoro Zashi
Model | Score | Comments |
---|---|---|
RipX built-in software | 75 | Slightly mixed with instrumental sound, slight noise |
MDX23 | 95 | Almost perfect |
htdemucs_ft | 85 | Slightly mixed with instrumental sound, vocals may sound muffled at times |
Kim_vocal_2 | 95 | Almost perfect |
4_HP_Vocal_UVR | 70 | Accompaniment vocals mixed together |
Renai Saiban
Model | Score | Comments |
---|---|---|
RipX built-in software | 70 | Accompaniment initially recognized as vocals, slight accompaniment, occasional muffled sound |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 80 | Mixed vocals and accompaniment |
Kim_vocal_2 | 85 | Slightly mixed with instrumental sound, occasional sudden increase in accompaniment volume, accompaniment part recognized as vocals |
4_HP_Vocal_UVR | 65 | Accompaniment initially recognized as vocals, slight accompaniment, noise |
Test... Four? (Electronic music)
The accompaniment sound is relatively low in this song.
Surprisingly, the results are quite good.
Model | Score | Comments |
---|---|---|
RipX built-in software | 65 | Some accompaniment recognized as vocals, accompaniment and vocals mixed together at times, some muffled sound |
MDX23 | 95 | Sometimes pure accompaniment recognized as vocals, can be trimmed |
htdemucs_ft | 80 | Accompaniment mixed with vocals |
Kim_vocal_2 | 90 | Occasionally slight accompaniment |
4_HP_Vocal_UVR | 50 | Some accompaniment recognized as vocals, accompaniment and vocals mixed together for a long time |
Conclusion
MDX23 is currently the strongest model. According to the running logs, it seems to be a combination of (htdemucs_ft), (demucs MDXv3), (UVR-MDX-NET Voc FT), and (UVR-MDX-NET inst HQ 3), but it is very slow. It takes 17 minutes to process a 5-minute song using Colab T4.
htdemucs_ft is a more balanced model. If you want to retain vocals and accompaniment, you can choose htdemucs_ft, which has better results than MDX Main.
Kim_vocal_2 is also good for vocal separation, and it is fast. It can be chosen for bulk processing to save time
It is still best to use htdemucs_ft or MDX23.