Preface#
At this stage, I strongly, strongly, strongly, very much do not recommend using songs with loud accompaniment, fast tempo, female vocals, and high pitches for training models to separate vocals. It's completely torturing yourself and won't yield good results.
I've tried it four times already, don't try it again
Supplement#
On September 3, 2023, I tried to convert some songs in batches, and the result was that the vocal part of Kim_vocal_2 had a lot of mixed accompaniment vocals, muffled sound, and noise issues, while the accompaniment was perfect. I suspect this model is imitating me. Most of the songs other than the ones tested below are not suitable, which is a waste of time.
Introduction#
This article has a time limit and may not be updated in the future
Since I have trained the RVC model before, but the effect was not very good, although the demucs v4 model has better separation effect, there is a chance that the vocals will sound muffled.
This article will not use paid products for testing. Although the effect of the Tuanzi AI is really good, it can perfectly separate disaster-level songs.
I didn't realize that RVC was updated more than ten days ago. The updated pitch extraction algorithm reduces the occurrence of new version of pitch breaking. I thought I didn't need it anymore, so I deleted the model and had to train it again.
Testing#
The results of this test are based on subjective listening. Different songs and parameters may yield different results. If there are better parameters, please correct them in the comments
Note!!! The female voice used in this article may not necessarily be a female singer 😹, and the same goes for male voices 😹
The maximum score is 100, but the score will never be 100 because there will always be differences between the separated vocals and the original recording
Some music cannot be found, so I can't play it for fear of DMCA
For vocal separation without accompaniment, most models have good results except for the old model. This article will not test them anymore.
Several popular vocal models widely praised online are used in this article, with MDX23 parameters referring to the MVSep leaderboard.
There will always be some accompaniment sounds when there is no vocals, which can be cut off. This issue will not be mentioned again in the following text
Run on Colab.
The computer cannot handle it and runs out of memory. Running on Colab requires 13.8GB of memory.
BigShifts_MDX = 21
overlap_MDX = 0
overlap_MDXv3 = 20
weight_MDXv3 = 6
weight_VOCFT = 5
weight_HQ3 = 2
overlap_demucs = 0.8
output_format = 'FLOAT'
vocals_instru_only = True
if vocals_instru_only:
vocals_only = '--vocals_only true'
else:
vocals_only = ''
chunk_size = 1000000
Other models use the default UVR parameters.
Test 1 (Disaster debuff stacked)#
No hope anymore
Test audio, high BPM, long periods of clipping, with accompaniment volume sometimes higher than vocals, vocals and accompaniment completely mixed together, maybe the music quality is a bit low. The song costs 204 yen on iTunes (didn't buy it). The debuffs are stacked, it's a disaster.
Interestingly, in this song, the vocals and accompaniment sound fine when mixed together, but when separated, the vocals have lower sound quality, while the accompaniment is unaffected. It may be because the accompaniment volume is too loud.
Model | Score | Comments |
---|---|---|
RipX built-in software | 50 | Mixed with instrumental sound, vocals sound muffled |
MDX23 | 70 | Slightly mixed with instrumental sound, occasional noise when accompaniment volume is high |
htdemucs_ft | 40 | Mixed with accompaniment sound, vocals sound muffled, noise |
Kim_vocal_2 | 65 | Vocals sound significantly muffled, noise |
4_HP_Vocal_UVR | 35 | Mixed with loud accompaniment sound, vocals sound muffled, noise |
Test 2 (Light accompaniment, main instrument is one)#
Female vocals#
The main instrument is a guitar, and the separation effect is already good in RipX.
Model | Score | Comments |
---|---|---|
RipX built-in software | 80 | Mixed with instrumental sound |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 90 | Slightly mixed with instrumental sound |
Kim_vocal_2 | 85 | Occasionally slightly mixed with instrumental sound |
4_HP_Vocal_UVR | 85 | Mixed with instrumental sound |
Male vocals#
Using the same song, there are slight differences in the accompaniment. Hilariously, I can't find an exact match Although I don't know why the BPM has become so high
Model | Score | Comments |
---|---|---|
RipX built-in software | 95 | Almost perfect |
MDX23 | 98 | Almost perfect |
htdemucs_ft | 93 | Partially mixed with instrumental sound |
Kim_vocal_2 | 97 | Almost perfect |
4_HP_Vocal_UVR | 96 | Almost perfect |
Test 3 (Pop music)#
Different versions of "Kokoro Zashi" and "Renai Saiban"
The accompaniment for "Renai Saiban" is the same.
Different songs may have different singers 🤔
Female vocals#
Kokoro Zashi#
Model | Score | Comments |
---|---|---|
RipX built-in software | 80 | Some vocals sound muffled |
MDX23 | 95 | Almost perfect |
htdemucs_ft | 95 | Almost perfect |
Kim_vocal_2 | 95 | Almost perfect |
4_HP_Vocal_UVR | 85 | Slightly mixed with instrumental sound |
Renai Saiban#
Model | Score | Comments |
---|---|---|
RipX built-in software | 75 | Slightly mixed with instrumental sound, occasional noise |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 80 | Mixed vocals and accompaniment |
Kim_vocal_2 | 90 | Slightly mixed with instrumental sound |
4_HP_Vocal_UVR | 85 | Slightly mixed with instrumental sound, occasional noise |
Male vocals#
Kokoro Zashi#
Model | Score | Comments |
---|---|---|
RipX built-in software | 75 | Slightly mixed with instrumental sound, occasional noise |
MDX23 | 95 | Almost perfect |
htdemucs_ft | 85 | Slightly mixed with instrumental sound, vocals may sound muffled |
Kim_vocal_2 | 95 | Almost perfect |
4_HP_Vocal_UVR | 70 | Accompaniment mixed with vocals |
Renai Saiban#
Model | Score | Comments |
---|---|---|
RipX built-in software | 70 | Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, occasional muffled sound |
MDX23 | 90 | Slightly mixed with instrumental sound |
htdemucs_ft | 80 | Mixed vocals and accompaniment |
Kim_vocal_2 | 85 | Slightly mixed with instrumental sound, occasional sudden increase in accompaniment volume, accompaniment part recognized as vocals |
4_HP_Vocal_UVR | 65 | Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, noise |
Test... 4? (Electronic music)#
The accompaniment in this song is relatively quiet.
Surprisingly, the results are quite good.
Model | Score | Comments |
---|---|---|
RipX built-in software | 65 | Some accompaniment recognized as vocals, accompaniment and vocals occasionally mixed, some vocals sound muffled |
MDX23 | 95 | Sometimes pure accompaniment recognized as vocals, can be cut off |
htdemucs_ft | 80 | Accompaniment mixed with vocals |
Kim_vocal_2 | 90 | Occasionally slight accompaniment |
4_HP_Vocal_UVR | 50 | Some accompaniment recognized as vocals, accompaniment and vocals mixed for a long time |
Conclusion#
MDX23 is currently the strongest model. According to the runtime logs, it seems to be a combination of (htdemucs_ft), (demucs MDXv3), (UVR-MDX-NET Voc FT), and (UVR-MDX-NET inst HQ 3), but it is very slow. It takes 17 minutes to process a 5-minute song on Colab with a T4 GPU.
htdemucs_ft is a more balanced model. If you want to preserve both vocals and accompaniment, you can choose htdemucs_ft, which has better results than MDX Main.
Kim_vocal_2 is also very good for vocal separation, and it is fast. If you need to process a large number of songs to save time, you can choose this model
It's still best to use htdemucs_ft or MDX23.