对于一个非常简单的节拍跟踪器,您可能想要使用 librosa 的内置节拍跟踪:
import librosa
y, sr = librosa.load(librosa.util.example_audio_file())
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
# beats now contains the beat *frame positions*
# convert to timestamps like this:
beat_times = librosa.frames_to_time(beats, sr=sr)
这给了你节拍的位置。但是您实际上一直在要求悲观估计。您找到具有最高能量的节拍的想法很好,但您可能需要结合一些额外的知识并平均对应的节拍。例如,如果您知道曲目是 4/4 拍,您可以将每 4 个节拍的能量相加,然后得出能量和最高的节拍位置为强拍。
大致是这样的:
import librosa
import numpy as np
y, sr = librosa.load('my file.wav')
# get onset envelope
onset_env = librosa.onset.onset_strength(y, sr=sr, aggregate=np.median)
# get tempo and beats
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
# we assume 4/4 time
meter = 4
# calculate number of full measures
measures = (len(beats) // meter)
# get onset strengths for the known beat positions
# Note: this is somewhat naive, as the main strength may be *around*
# rather than *on* the detected beat position.
beat_strengths = onset_env[beats]
# make sure we only consider full measures
# and convert to 2d array with indices for measure and beatpos
measure_beat_strengths = beat_strengths[:measures * meter].reshape(-1, meter)
# add up strengths per beat position
beat_pos_strength = np.sum(measure_beat_strengths, axis=0)
# find the beat position with max strength
downbeat_pos = np.argmax(beat_pos_strength)
# convert the beat positions to the same 2d measure format
full_measure_beats = beats[:measures * meter].reshape(-1, meter)
# and select the beat position we want: downbeat_pos
downbeat_frames = full_measure_beats[:, downbeat_pos]
print('Downbeat frames: {}'.format(downbeat_frames))
# print times
downbeat_times = librosa.frames_to_time(downbeat_frames, sr=sr)
print('Downbeat times in s: {}'.format(downbeat_times))
您使用此类代码的里程会有所不同。成功取决于音乐的种类、流派、节奏、节拍检测的质量等。那是因为它不是微不足道的。事实上,强拍估计是当前的音乐信息检索(MIR)研究课题,并没有完全解决。有关基于高级深度学习的自动强拍跟踪的最新评论,您可能需要查看这篇文章。