lingxue

lingxue

向着遥不可及的梦想,进发!
steam
youtube
keybase
twitter

Which is the best source separation model for vocal accompaniment? Open-source vocal accompaniment separation model vocal testing.

Introduction

At this stage, I strongly, strongly, strongly, highly do not recommend using loud accompaniment, fast-paced, female voice, high-pitched singers especially when these debuffs are stacked together, to train models for voice separation. This is completely torturing oneself and will not yield good results.

I have tried four times already, don't try again

Additional Information

2023-09-03 I tried batch conversion of some songs and found that there were a lot of problems with the accompaniment vocals mixing, muffled sound, and noise in the Kim_vocal_2 vocal part. The accompaniment, on the other hand, was perfect. I suspect this model is imitating me. Most of the songs, except for the ones tested below, are not suitable, which is a waste of time.

v2-0d4541125ee260e8e18376c80fe304b6_1440w

Introduction

This article is time-sensitive and may not be updated in the future

Since I have trained the RVC model before, but the results were not very good, although the demucs v4 model has better separation results, there is a chance that the vocals will sound muffled.

This article will not use paid products for testing. Although the effect of the Dango AI is really good, it can perfectly separate even disaster-level songs

I didn't realize that RVC was updated more than ten days ago, and the updated pitch extraction algorithm in the new version reduces the occurrence of broken sound. I thought I didn't need it anymore, so I deleted the model and had to train it again

v2-225619de6620b18a1efdc10a2a4fd3d1_r

Testing

The results of this test are based on human auditory perception. Different songs and parameters may yield different results. Corrections are welcome in the comments

Note!!! The female voice used in this article may not necessarily be a female singer 😹, and the same goes for male voices 😹

The maximum score is 100, but the score cannot be perfect because there will always be differences between the separated vocals and the original recording

Some music cannot be found, so I cannot provide it for fear of DMCA

For vocal separation without accompaniment, most models have good results except for the old model, so this article will not test them anymore.

Several well-received vocal models found online will be used for testing.

MDX23 parameters refer to the MVSep leaderboard.

There will always be some accompaniment sound when there is no vocals, which can be trimmed. This issue will not be mentioned further

Run on Colab

The computer cannot handle it and will run out of memory. Running on Colab requires 13.8GB of memory.

BigShifts_MDX = 21
overlap_MDX = 0
overlap_MDXv3 = 20
weight_MDXv3 = 6
weight_VOCFT = 5
weight_HQ3 = 2
overlap_demucs = 0.8
output_format = 'FLOAT'
vocals_instru_only = True
if vocals_instru_only:
vocals_only = '--vocals_only true'
else:
vocals_only = ''
chunk_size = 1000000

Other models use default UVR parameters.

Test 1 (Disaster with full debuffs)

I have no hope anymore

Test audio, high BPM, long duration of clipping, accompaniment volume larger than vocals at times, vocals and accompaniment completely mixed together, maybe the music quality is a bit low. The song costs 204 yen on Ringo Music (didn't buy it). All debuffs are stacked together

47de46d4ea63622136daee54a72d608f_1440w

Interestingly, in this song, the vocals and accompaniment mixed together sound fine, but after separation, the vocals have a low sound quality issue, while the accompaniment is unaffected. This may be because the accompaniment volume is too loud.

屏幕截图 2023-08-21 210636

ModelScoreComments
RipX built-in software50Mixed with instrumental sound, vocals sound muffled
MDX2370Slightly mixed with instrumental sound, occasional noise when accompaniment volume is high
htdemucs_ft40Mixed with accompaniment sound, vocals sound muffled, noise
Kim_vocal_265Vocals sound significantly muffled, noise
4_HP_Vocal_UVR35Mixed with loud accompaniment sound, vocals sound muffled, noise

Test 2 (Light accompaniment with one main instrument)

Female vocals

The main instrument is a guitar, and the separation results are already good in RipX.

屏幕截图 2023-08-25 221912

ModelScoreComments
RipX built-in software80Mixed with instrumental sound
MDX2390Slightly mixed with instrumental sound
htdemucs_ft90Slightly mixed with instrumental sound
Kim_vocal_285Occasionally slightly mixed with instrumental sound
4_HP_Vocal_UVR85Mixed with instrumental sound

Male vocals

The same song is used, but there are slight differences in the accompaniment. Hilariously, I can't find the same one Although I don't know why the BPM became so high

屏幕截图 2023-08-25 225313

ModelScoreComments
RipX built-in software95Almost perfect
MDX2398Almost perfect
htdemucs_ft93Partially mixed with instrumental sound
Kim_vocal_297Almost perfect
4_HP_Vocal_UVR96Almost perfect

Test 3 (Pop music)

Different versions of "Kokoro Zashi" and "Renai Saiban"

The accompaniment for "Renai Saiban" is the same.

Different songs may have different singers 🤔

Female vocals

Kokoro Zashi
ModelScoreComments
RipX built-in software80Some vocals sound muffled
MDX2395Almost perfect
htdemucs_ft95Almost perfect
Kim_vocal_295Almost perfect
4_HP_Vocal_UVR85Slightly mixed with instrumental sound
Renai Saiban
ModelScoreComments
RipX built-in software75Slightly mixed with instrumental sound, slight noise
MDX2390Slightly mixed with instrumental sound
htdemucs_ft80Mixed vocals and accompaniment
Kim_vocal_290Slightly mixed with instrumental sound
4_HP_Vocal_UVR85Slightly mixed with instrumental sound, slight noise

Male vocals

Kokoro Zashi
ModelScoreComments
RipX built-in software75Slightly mixed with instrumental sound, slight noise
MDX2395Almost perfect
htdemucs_ft85Slightly mixed with instrumental sound, vocals may sound muffled at times
Kim_vocal_295Almost perfect
4_HP_Vocal_UVR70Accompaniment vocals mixed together
Renai Saiban
ModelScoreComments
RipX built-in software70Accompaniment initially recognized as vocals, slight accompaniment, occasional muffled sound
MDX2390Slightly mixed with instrumental sound
htdemucs_ft80Mixed vocals and accompaniment
Kim_vocal_285Slightly mixed with instrumental sound, occasional sudden increase in accompaniment volume, accompaniment part recognized as vocals
4_HP_Vocal_UVR65Accompaniment initially recognized as vocals, slight accompaniment, noise

Test... Four? (Electronic music)

The accompaniment sound is relatively low in this song.

Surprisingly, the results are quite good.

屏幕截图 2023-08-27 134201

ModelScoreComments
RipX built-in software65Some accompaniment recognized as vocals, accompaniment and vocals mixed together at times, some muffled sound
MDX2395Sometimes pure accompaniment recognized as vocals, can be trimmed
htdemucs_ft80Accompaniment mixed with vocals
Kim_vocal_290Occasionally slight accompaniment
4_HP_Vocal_UVR50Some accompaniment recognized as vocals, accompaniment and vocals mixed together for a long time

Conclusion

MDX23 is currently the strongest model. According to the running logs, it seems to be a combination of (htdemucs_ft), (demucs MDXv3), (UVR-MDX-NET Voc FT), and (UVR-MDX-NET inst HQ 3), but it is very slow. It takes 17 minutes to process a 5-minute song using Colab T4.

htdemucs_ft is a more balanced model. If you want to retain vocals and accompaniment, you can choose htdemucs_ft, which has better results than MDX Main.

Kim_vocal_2 is also good for vocal separation, and it is fast. It can be chosen for bulk processing to save time

It is still best to use htdemucs_ft or MDX23.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.