Which is the best source separation model for vocal accompaniment? Open source vocal accompaniment separation model vocal testing.

Preface#

At this stage, I strongly, strongly, strongly, very much do not recommend using songs with loud accompaniment, fast tempo, female vocals, and high pitches for training models to separate vocals. It's completely torturing yourself and won't yield good results.

~~I've tried it four times already, don't try it again~~

Supplement#

On September 3, 2023, I tried to convert some songs in batches, and the result was that the vocal part of Kim_vocal_2 had a lot of mixed accompaniment vocals, muffled sound, and noise issues, while the accompaniment was perfect. ~~I suspect this model is imitating me~~. Most of the songs other than the ones tested below are not suitable, which is a waste of time.

v2-0d4541125ee260e8e18376c80fe304b6_1440w

Introduction#

This article has a time limit and may not be updated in the future

Since I have trained the RVC model before, but the effect was not very good, although the demucs v4 model has better separation effect, there is a chance that the vocals will sound muffled.

This article will not use paid products for testing. ~~Although the effect of the Tuanzi AI is really good, it can perfectly separate disaster-level songs~~.

I didn't realize that RVC was updated more than ten days ago. The updated pitch extraction algorithm reduces the occurrence of new version of pitch breaking. I thought I didn't need it anymore, so I deleted the model and had to train it again.

v2-225619de6620b18a1efdc10a2a4fd3d1_r

Testing#

The results of this test are based on subjective listening. Different songs and parameters may yield different results. If there are better parameters, please correct them in the comments

Note!!! The female voice used in this article may not necessarily be a female singer 😹, and the same goes for male voices 😹

The maximum score is 100, but the score will never be 100 because there will always be differences between the separated vocals and the original recording

Some music cannot be found, so I can't play it ~~for fear of DMCA~~

For vocal separation without accompaniment, most models have good results except for the old model. This article will not test them anymore.

Several popular vocal models widely praised online are used in this article, with MDX23 parameters referring to the MVSep leaderboard.

There will always be some accompaniment sounds when there is no vocals, which can be cut off. This issue will not be mentioned again in the following text

Run on Colab.

The computer cannot handle it and runs out of memory. Running on Colab requires 13.8GB of memory.

BigShifts_MDX = 21
overlap_MDX = 0
overlap_MDXv3 = 20
weight_MDXv3 = 6
weight_VOCFT = 5
weight_HQ3 = 2
overlap_demucs = 0.8
output_format = 'FLOAT'
vocals_instru_only = True
if vocals_instru_only:
vocals_only = '--vocals_only true'
else:
vocals_only = ''
chunk_size = 1000000

Other models use the default UVR parameters.

Test 1 (Disaster debuff stacked)#

~~No hope anymore~~

Test audio, high BPM, long periods of clipping, with accompaniment volume sometimes higher than vocals, vocals and accompaniment completely mixed together, maybe the music quality is a bit low. The song costs 204 yen on iTunes (didn't buy it). ~~The debuffs are stacked, it's a disaster~~.

47de46d4ea63622136daee54a72d608f_1440w

Interestingly, in this song, the vocals and accompaniment sound fine when mixed together, but when separated, the vocals have lower sound quality, while the accompaniment is unaffected. It may be because the accompaniment volume is too loud.

屏幕截图 2023-08-21 210636

Model	Score	Comments
RipX built-in software	50	Mixed with instrumental sound, vocals sound muffled
MDX23	70	Slightly mixed with instrumental sound, occasional noise when accompaniment volume is high
htdemucs_ft	40	Mixed with accompaniment sound, vocals sound muffled, noise
Kim_vocal_2	65	Vocals sound significantly muffled, noise
4_HP_Vocal_UVR	35	Mixed with loud accompaniment sound, vocals sound muffled, noise

Test 2 (Light accompaniment, main instrument is one)#

Female vocals#

The main instrument is a guitar, and the separation effect is already good in RipX.

屏幕截图 2023-08-25 221912

Model	Score	Comments
RipX built-in software	80	Mixed with instrumental sound
MDX23	90	Slightly mixed with instrumental sound
htdemucs_ft	90	Slightly mixed with instrumental sound
Kim_vocal_2	85	Occasionally slightly mixed with instrumental sound
4_HP_Vocal_UVR	85	Mixed with instrumental sound

Male vocals#

Using the same song, there are slight differences in the accompaniment. ~~Hilariously, I can't find an exact match~~ ~~Although I don't know why the BPM has become so high~~

屏幕截图 2023-08-25 225313

Model	Score	Comments
RipX built-in software	95	Almost perfect
MDX23	98	Almost perfect
htdemucs_ft	93	Partially mixed with instrumental sound
Kim_vocal_2	97	Almost perfect
4_HP_Vocal_UVR	96	Almost perfect

Test 3 (Pop music)#

~~Different versions of "Kokoro Zashi" and "Renai Saiban"~~

The accompaniment for "Renai Saiban" is the same.

Different songs may have different singers 🤔

Female vocals#

Kokoro Zashi#

Model	Score	Comments
RipX built-in software	80	Some vocals sound muffled
MDX23	95	Almost perfect
htdemucs_ft	95	Almost perfect
Kim_vocal_2	95	Almost perfect
4_HP_Vocal_UVR	85	Slightly mixed with instrumental sound

Renai Saiban#

Model	Score	Comments
RipX built-in software	75	Slightly mixed with instrumental sound, occasional noise
MDX23	90	Slightly mixed with instrumental sound
htdemucs_ft	80	Mixed vocals and accompaniment
Kim_vocal_2	90	Slightly mixed with instrumental sound
4_HP_Vocal_UVR	85	Slightly mixed with instrumental sound, occasional noise

Male vocals#

Kokoro Zashi#

Model	Score	Comments
RipX built-in software	75	Slightly mixed with instrumental sound, occasional noise
MDX23	95	Almost perfect
htdemucs_ft	85	Slightly mixed with instrumental sound, vocals may sound muffled
Kim_vocal_2	95	Almost perfect
4_HP_Vocal_UVR	70	Accompaniment mixed with vocals

Renai Saiban#

Model	Score	Comments
RipX built-in software	70	Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, occasional muffled sound
MDX23	90	Slightly mixed with instrumental sound
htdemucs_ft	80	Mixed vocals and accompaniment
Kim_vocal_2	85	Slightly mixed with instrumental sound, occasional sudden increase in accompaniment volume, accompaniment part recognized as vocals
4_HP_Vocal_UVR	65	Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, noise

Test... 4? (Electronic music)#

The accompaniment in this song is relatively quiet.

Surprisingly, the results are quite good.

屏幕截图 2023-08-27 134201

Model	Score	Comments
RipX built-in software	65	Some accompaniment recognized as vocals, accompaniment and vocals occasionally mixed, some vocals sound muffled
MDX23	95	Sometimes pure accompaniment recognized as vocals, can be cut off
htdemucs_ft	80	Accompaniment mixed with vocals
Kim_vocal_2	90	Occasionally slight accompaniment
4_HP_Vocal_UVR	50	Some accompaniment recognized as vocals, accompaniment and vocals mixed for a long time

Conclusion#

MDX23 is currently the strongest model. According to the runtime logs, it seems to be a combination of (htdemucs_ft), (demucs MDXv3), (UVR-MDX-NET Voc FT), and (UVR-MDX-NET inst HQ 3), but it is very slow. It takes 17 minutes to process a 5-minute song on Colab with a T4 GPU.

htdemucs_ft is a more balanced model. If you want to preserve both vocals and accompaniment, you can choose htdemucs_ft, which has better results than MDX Main.

~~Kim_vocal_2 is also very good for vocal separation, and it is fast. If you need to process a large number of songs to save time, you can choose this model~~

It's still best to use htdemucs_ft or MDX23.