algorithm - 从音轨中删除人声的算法

Question

我想从 mp3 音轨中删除人声。我搜索了谷歌，并尝试了一些软件，但没有一个是令人信服的。我打算读取 mp3 文件，获取波形并删除高于指定限制的波形。

你对如何进行有什么建议吗？

- 更新

我只想要可以读取 mp3 文件格式的代码。有什么软件吗？？

score 17 · Accepted Answer

This isn't so much an "algorithm" as a "trick" but it could be automated in code. It works mostly for stereo tracks with where the vocals are centered. If the vocals are centered, they manifest equally in both tracks. If you invert one of the tracks and then merge them back together, the wave forms of the center vocals cancel out and are virtually removed. You can do this manually with most good audio editors like audacity. It doesn't give you perfect results and the rest of the audio suffers a bit too but it makes for great karaoke tracks :)

score 10 · Accepted Answer

资料来源：http : //www.cdf.utoronto.ca/~csc209h/summer/a2/a2.html，由 Daniel Zingaro 撰写。

声音是气压波。当声音产生时，由压缩（压力增加）和稀疏（压力降低）组成的声波在空气中移动。这类似于你将一块石头扔进池塘里会发生什么：水在重复的波浪中上升和下降。

当麦克风记录声音时，它会测量气压并将其作为值返回。这些值被称为样本，可以是正值或负值，对应于气压的增加或减少。每次记录气压时，我们都会对声音进行采样。每个样本都记录了瞬间的声音；我们采样的速度越快，我们对声音的表示就越准确。采样率是指我们每秒对声音进行采样的次数。例如，CD 质量的声音使用每秒 44100 个样本的采样率；采样某人的声音以用于 VOIP 对话的使用远少于此。11025（语音质量）、22050 和 44100（CD 质量）的采样率很常见...

对于单声道声音（具有一个声道的声音），样本只是一个正整数或负整数，表示采样点处空气中的压缩量。对于立体声（我们在本作业中使用），一个样本实际上由两个整数值组成：一个用于左扬声器，一个用于右扬声器......

以下是算法 [去除人声] 的工作原理。

将前 44 个字节从输入文件逐字复制到输出文件。这 44 个字节包含不应修改的重要标头信息。

接下来，将输入文件的其余部分视为一系列短裤。左右取每一对短裤，并计算 combine = (left - right) / 2。将 combine 的两个副本写入输出文件。

为什么这行得通？

对于好奇的人，有必要对声音去除算法进行简要说明。正如您从算法中注意到的那样，我们只是从另一个通道中减去一个通道（然后除以 2 以防止音量变得太大）。那么为什么从右声道中减去左声道会神奇地去除人声呢？

录制音乐时，有时会出现人声由单个麦克风录制的情况，并且单个人声轨道用于两个通道中的人声。歌曲中的其他乐器由多个麦克风录制，因此它们在两个通道中听起来不同。从另一个通道中减去一个通道会带走这两个通道之间“共同”的所有内容，如果幸运的话，这意味着去除人声。

当然，事情很少能如此顺利。在这个行为不端的 wav 文件上尝试你的声音去除器。当然，人声消失了，但音乐的主体也消失了！显然，一些乐器也被“居中”录制，因此当减去通道时，它们与人声一起被移除。

score 5 · Accepted Answer

您可以使用 pydub 工具箱，详细信息请参见此处，相关问题也请参见此处。它依赖于FFmpeg并且可以读取任何文件格式

然后您可以执行以下操作：

from pydub import AudioSegment
from pydub.playback import play

# read in audio file and get the two mono tracks
sound_stereo = AudioSegment.from_file(myAudioFile, format="mp3")
sound_monoL = sound_stereo.split_to_mono()[0]
sound_monoR = sound_stereo.split_to_mono()[1]

# Invert phase of the Right audio file
sound_monoR_inv = sound_monoR.invert_phase()

# Merge two L and R_inv files, this cancels out the centers
sound_CentersOut = sound_monoL.overlay(sound_monoR_inv)

# Export merged audio file
fh = sound_CentersOut.export(myAudioFile_CentersOut, format="mp3")

score 1 · Accepted Answer

Above a specified limit? sounds like a high pass filter...You could use phase cancellation if you had the acapella track along with the original. Otherwise, unless its an old 60s-era track that has vocals directly in the middle and everything else hard panned, i don't think there's a super clean way of removing vocals.

algorithm - 从音轨中删除人声的算法

4 回答 4

Related

Reference