Separate instrument tracks from a song
How to separate instrument tracks from a song into vocals, drums, bass, and other.
NOTE: For research purposes only. Use at your own risk.
I study/read/work with soundtracks. That’s the best way I can focus. I cannot concentrate listening to music that has vocals. I’ve been wondering what is like to listen to other music I like without the vocals.
I wonder what Daft Punk sounds without vocals. I know. Most songs by Daft Punk are already instrumental.
I found this repo with a list of ML projects. See here
One of them is a project called demucs
, used to separate tracks from songs. More about the project here
Installing demucs
Install using this:
pip install demucs
The output shows that a lot of dependencies need to be installed:
- demucs
- diffq>=0.2.1
- dora-search
- einops
- julius>=0.2.3
- lameenc>=1.2
- openunmix
- torch>=1.8.1 (this was the largest with approx 900MB)
- torchaudio>=0.8
- tqdm
- Cython
- numpy
- nvidia-cudnn-cu11==8.5.0.96 (also a big one 500MB)
- nvidia-cublas-cu11==11.10.3.66 (300MB)
- nvidia-cuda-nvrtc-cu11==11.7.99
- nvidia-cuda-runtime-cu11==11.7.99
- retrying
- treetable
- omegaconf
- submitit
- antlr4-python3-runtime==4.9.*
- cloudpickle>=1.2.1
More output:
Successfully built demucs julius dora-search antlr4-python3-runtime treetable
Installing collected packages: lameenc, antlr4-python3-runtime, treetable, tqdm, retrying, omegaconf, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cublas-cu11, numpy, einops, Cython, cloudpickle, submitit, nvidia-cudnn-cu11, torch, torchaudio, julius, dora-search, diffq, openunmix, demucs
Successfully installed Cython-0.29.33 antlr4-python3-runtime-4.9.3 cloudpickle-2.2.1 demucs-4.0.0 diffq-0.2.3 dora-search-0.1.11 einops-0.6.0 julius-0.2.7 lameenc-1.4.2 numpy-1.24.2 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 omegaconf-2.3.0 openunmix-1.2.1 retrying-1.3.4 submitit-1.4.5 torch-1.13.1 torchaudio-0.13.1 tqdm-4.64.1 treetable-0.2.5
Running demucs
Run with this:
demucs path-to-file
Output running the program
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /home/tom/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
80.2M/80.2M
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/tom/Music/separated/htdemucs
Waiting…
It took 25 minutes to run an 8MB
mp3 song.
Reviewing the output
Only for research purposes I used the song “Touch” by Daft Punk.
The result was four .wav
tracks named bass
, drums
, other
, and vocals
. Each one was 88MB
.
Loaded them into Audacity.
The result is wow. Amazing.
Listening to each individual track. They are almost like different songs.
Only bass and drums. Or bass and vocals. Or bass and other. You could make a whole album mixing this song in different ways.
The vocals
were removed from the beginning, end, and when you can clearly hear the singer. In the end, you only hear the piano. The vocals couldn’t be removed in the middle when there is a mix of human/robot singing. The angelic vocals were almost removed. You can hear them very quietly in the background. The vocals track also has the sound of wind blowing in the beginning.
Conclusion
The demucs
program is a product of Facebook Research. See the repo here. It has a research paper with all the proper complicated math that requires superhuman understanding. It did an amazing job with the song I tried.