lingxue

lingxue

向着遥不可及的梦想,进发!
steam
youtube
keybase
twitter

Which is the best source separation model for vocal accompaniment? Open source vocal accompaniment separation model vocal testing.

Preface#

At this stage, I strongly, strongly, strongly, very much do not recommend using songs with loud accompaniment, fast tempo, female vocals, and high pitches for training models to separate vocals. It's completely torturing yourself and won't yield good results.

I've tried it four times already, don't try it again

Supplement#

On September 3, 2023, I tried to convert some songs in batches, and the result was that the vocal part of Kim_vocal_2 had a lot of mixed accompaniment vocals, muffled sound, and noise issues, while the accompaniment was perfect. I suspect this model is imitating me. Most of the songs other than the ones tested below are not suitable, which is a waste of time.

v2-0d4541125ee260e8e18376c80fe304b6_1440w

Introduction#

This article has a time limit and may not be updated in the future

Since I have trained the RVC model before, but the effect was not very good, although the demucs v4 model has better separation effect, there is a chance that the vocals will sound muffled.

This article will not use paid products for testing. Although the effect of the Tuanzi AI is really good, it can perfectly separate disaster-level songs.

I didn't realize that RVC was updated more than ten days ago. The updated pitch extraction algorithm reduces the occurrence of new version of pitch breaking. I thought I didn't need it anymore, so I deleted the model and had to train it again.

v2-225619de6620b18a1efdc10a2a4fd3d1_r

Testing#

The results of this test are based on subjective listening. Different songs and parameters may yield different results. If there are better parameters, please correct them in the comments

Note!!! The female voice used in this article may not necessarily be a female singer 😹, and the same goes for male voices 😹

The maximum score is 100, but the score will never be 100 because there will always be differences between the separated vocals and the original recording

Some music cannot be found, so I can't play it for fear of DMCA

For vocal separation without accompaniment, most models have good results except for the old model. This article will not test them anymore.

Several popular vocal models widely praised online are used in this article, with MDX23 parameters referring to the MVSep leaderboard.

There will always be some accompaniment sounds when there is no vocals, which can be cut off. This issue will not be mentioned again in the following text

Run on Colab.

The computer cannot handle it and runs out of memory. Running on Colab requires 13.8GB of memory.

BigShifts_MDX = 21
overlap_MDX = 0
overlap_MDXv3 = 20
weight_MDXv3 = 6
weight_VOCFT = 5
weight_HQ3 = 2
overlap_demucs = 0.8
output_format = 'FLOAT'
vocals_instru_only = True
if vocals_instru_only:
vocals_only = '--vocals_only true'
else:
vocals_only = ''
chunk_size = 1000000

Other models use the default UVR parameters.

Test 1 (Disaster debuff stacked)#

No hope anymore

Test audio, high BPM, long periods of clipping, with accompaniment volume sometimes higher than vocals, vocals and accompaniment completely mixed together, maybe the music quality is a bit low. The song costs 204 yen on iTunes (didn't buy it). The debuffs are stacked, it's a disaster.

47de46d4ea63622136daee54a72d608f_1440w

Interestingly, in this song, the vocals and accompaniment sound fine when mixed together, but when separated, the vocals have lower sound quality, while the accompaniment is unaffected. It may be because the accompaniment volume is too loud.

屏幕截图 2023-08-21 210636

ModelScoreComments
RipX built-in software50Mixed with instrumental sound, vocals sound muffled
MDX2370Slightly mixed with instrumental sound, occasional noise when accompaniment volume is high
htdemucs_ft40Mixed with accompaniment sound, vocals sound muffled, noise
Kim_vocal_265Vocals sound significantly muffled, noise
4_HP_Vocal_UVR35Mixed with loud accompaniment sound, vocals sound muffled, noise

Test 2 (Light accompaniment, main instrument is one)#

Female vocals#

The main instrument is a guitar, and the separation effect is already good in RipX.

屏幕截图 2023-08-25 221912

ModelScoreComments
RipX built-in software80Mixed with instrumental sound
MDX2390Slightly mixed with instrumental sound
htdemucs_ft90Slightly mixed with instrumental sound
Kim_vocal_285Occasionally slightly mixed with instrumental sound
4_HP_Vocal_UVR85Mixed with instrumental sound

Male vocals#

Using the same song, there are slight differences in the accompaniment. Hilariously, I can't find an exact match Although I don't know why the BPM has become so high

屏幕截图 2023-08-25 225313

ModelScoreComments
RipX built-in software95Almost perfect
MDX2398Almost perfect
htdemucs_ft93Partially mixed with instrumental sound
Kim_vocal_297Almost perfect
4_HP_Vocal_UVR96Almost perfect

Test 3 (Pop music)#

Different versions of "Kokoro Zashi" and "Renai Saiban"

The accompaniment for "Renai Saiban" is the same.

Different songs may have different singers 🤔

Female vocals#

Kokoro Zashi#
ModelScoreComments
RipX built-in software80Some vocals sound muffled
MDX2395Almost perfect
htdemucs_ft95Almost perfect
Kim_vocal_295Almost perfect
4_HP_Vocal_UVR85Slightly mixed with instrumental sound
Renai Saiban#
ModelScoreComments
RipX built-in software75Slightly mixed with instrumental sound, occasional noise
MDX2390Slightly mixed with instrumental sound
htdemucs_ft80Mixed vocals and accompaniment
Kim_vocal_290Slightly mixed with instrumental sound
4_HP_Vocal_UVR85Slightly mixed with instrumental sound, occasional noise

Male vocals#

Kokoro Zashi#
ModelScoreComments
RipX built-in software75Slightly mixed with instrumental sound, occasional noise
MDX2395Almost perfect
htdemucs_ft85Slightly mixed with instrumental sound, vocals may sound muffled
Kim_vocal_295Almost perfect
4_HP_Vocal_UVR70Accompaniment mixed with vocals
Renai Saiban#
ModelScoreComments
RipX built-in software70Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, occasional muffled sound
MDX2390Slightly mixed with instrumental sound
htdemucs_ft80Mixed vocals and accompaniment
Kim_vocal_285Slightly mixed with instrumental sound, occasional sudden increase in accompaniment volume, accompaniment part recognized as vocals
4_HP_Vocal_UVR65Accompaniment initially recognized as vocals, slightly mixed with instrumental sound, noise

Test... 4? (Electronic music)#

The accompaniment in this song is relatively quiet.

Surprisingly, the results are quite good.

屏幕截图 2023-08-27 134201

ModelScoreComments
RipX built-in software65Some accompaniment recognized as vocals, accompaniment and vocals occasionally mixed, some vocals sound muffled
MDX2395Sometimes pure accompaniment recognized as vocals, can be cut off
htdemucs_ft80Accompaniment mixed with vocals
Kim_vocal_290Occasionally slight accompaniment
4_HP_Vocal_UVR50Some accompaniment recognized as vocals, accompaniment and vocals mixed for a long time

Conclusion#

MDX23 is currently the strongest model. According to the runtime logs, it seems to be a combination of (htdemucs_ft), (demucs MDXv3), (UVR-MDX-NET Voc FT), and (UVR-MDX-NET inst HQ 3), but it is very slow. It takes 17 minutes to process a 5-minute song on Colab with a T4 GPU.

htdemucs_ft is a more balanced model. If you want to preserve both vocals and accompaniment, you can choose htdemucs_ft, which has better results than MDX Main.

Kim_vocal_2 is also very good for vocal separation, and it is fast. If you need to process a large number of songs to save time, you can choose this model

It's still best to use htdemucs_ft or MDX23.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.