-1

我正在尝试使用 STFT 找到音频信号(钢琴录音)中的突出峰值。这是我到目前为止所做的 1. 获取时域信号的包络 2. 确定包络信号中的峰值并将它们用作音符起始点 3. 对每 2 个连续起始点之间的样本执行 FFT。

现在我有了 FFT,我想找到与演奏的音符相对应的峰值......当我尝试findpeaks在某些时候使用该函数时,它说它是一个空矩阵。

clear all;
clear max;
clc;

[song,FS] = wavread('C major.wav');
sound(song,FS);

P = 20000;
N=length(song);                     % length of song
t=0:1/FS:(N-1)/FS;                  % define time period


song = sum(song,2);                        
song=abs(song);
%windowing = hamming(32768); %Windowing function

% Plot time domain signal
figure(1);
          subplot(2,1,1)
          plot(t,3*song)
          title('Wave File')
          ylabel('Amplitude')
          xlabel('Length (in seconds)')
          %ylim([-1.1 1.1])
          xlim([0 N/FS])

%----------------------Finding the envelope of the signal-----------------%
% Gaussian Filter
x = linspace( -1, 1, P);                      % create a vector of P values between -1 and 1 inclusive
sigma = 0.335;                                % standard deviation used in Gaussian formula
myFilter = -x .* exp( -(x.^2)/(2*sigma.^2));  % compute first derivative, but leave constants out
myFilter = myFilter / sum( abs( myFilter ) ); % normalize

% Plot Gaussian Filter
         subplot(2,1,2)       
         plot(myFilter)
         title('Edge Detection Filter')

% fft convolution
myFilter = myFilter(:);                         % create a column vector
song(length(song)+length(myFilter)-1) = 0;      %zero pad song
myFilter(length(song)) = 0;                     %zero pad myFilter
edges =ifft(fft(song).*fft(myFilter));

tedges=edges(P:N+P-1);                      % shift by P/2 so peaks line up w/ edges
tedges=tedges/max(abs(tedges));                 % normalize

%---------------------------Onset Detection-------------------------------%
% Finding peaks
maxtab = [];
mintab = [];
x = (1:length(tedges));
min1 = Inf;
max1 = -Inf;
min_pos = NaN; 
max_pos = NaN;

lookformax = 1;
for i=1:length(tedges)

    peak = tedges(i:i);
  if peak > max1, 
      max1 = peak;
      max_pos = x(i); 
  end
  if peak < min1, 
      min1 = peak;
      min_pos = x(i); 
  end

  if lookformax
    if peak < max1-0.01
      maxtab = [maxtab ; max_pos max1];
      min1 = peak; 
      min_pos = x(i);
      lookformax = 0;
    end  
  else
    if peak > min1+0.05
      mintab = [mintab ; min_pos min1];
      max1 = peak; 
      max_pos = x(i);
      lookformax = 1;
    end
  end
end
% % Plot song filtered with edge detector          
         figure(2)
         plot(1/FS:1/FS:N/FS,tedges)
         title('Song Filtered With Edge Detector 1')
         xlabel('Time (s)')
         ylabel('Amplitude')
         ylim([-1 1.1])
         xlim([0 N/FS])

         hold on;

         plot(maxtab(:,1)/FS, maxtab(:,2), 'ro')
         plot(mintab(:,1)/FS, mintab(:,2), 'ko')

max_col = maxtab(:,1);
peaks_det = max_col/FS; 
No_of_peaks = length(peaks_det);

song = detrend(song);
%---------------------------Performing FFT--------------------------------%
 for i = 2:No_of_peaks

    song_seg = song(max_col(i-1):max_col(i)-1);
%     song_seg = song(max_col(6):max_col(7)-1);
    L = length(song_seg);    
    NFFT = 2^nextpow2(L); % Next power of 2 from length of y

    seg_fft = fft(song_seg,NFFT);%/L;

    N=5;Fst1=50;Fp1=60; Fp2=1040; Fst2=1050;

%     d = fdesign.bandpass('N,Fst1,Fp1,Fp2,Fst2');
%     h = design(d);
%     seg_fft = filter(h, seg_fft);

%     seg_fft(1) = 0;
%     
    f = FS/2*linspace(0,1,NFFT/2+1);
    seg_fft2 = 2*abs(seg_fft(1:NFFT/2+1));
    L5 = length(song_seg);

    figure(1+i)
    plot(f,seg_fft2)
    title('Frequency spectrum of signal')
    xlabel('Frequency (Hz)')
    %xlim([0 2500])
    ylabel('|Y(f)|')
    ylim([0 300])

    %[B, IX] = sort(seg_fft2)

    %[points loc] = findpeaks(seg_fft);

    %STFT_out(:,i) = seg_fft2;

    %P=max(seg_fft2)
    [points, loc] = findpeaks(seg_fft2,'THRESHOLD',20)
 end
4

2 回答 2

2

如果您查看 findpeaks 的文档,阈值的含义是:

将峰与其相邻值之间的阈值高度差指定为正实数。findpeaks 只返回至少超过其邻居的峰值 'THRESHOLD' 的值。

因此在行

[points, loc] = findpeaks(seg_fft2,'THRESHOLD',20)

的值20可能太大了。该算法没有选择任何最大值,因为峰值最大值应位于其相邻点上方的 delta(y)=20 的条件导致它拒绝所有可能的最大值。

您可能想要指定MINPEAKHEIGHT

于 2013-09-15T16:12:37.473 回答
0

如果您正在尝试查找音符开始的峰值,我建议执行以下步骤,这在尝试找到嘈杂视频中哔哔声的开始时对我有用。

  1. 提取 wavfile.read(audio_wav_path) 和对应的 wavfile.read(piano_note_wav_path) 的信号。
  2. fftconvolve 都通过 scipy.signal (这会找到音频和音符信号之间的相似区域)
  3. fft 将钢琴音符与其自身进行卷积(例如,如果音频本身就是钢琴音符,则可以找到理想的形状)
  4. 将自卷积 (3) 与两者 (2) 的卷积相关联(这会找到 (2) 与 (3) 紧密匹配的区域,从而减少类似音符的误报读数)
  5. 制作 np 数组和阈值(只查看非常高的峰值)然后规范化(使您正在处理的数字更易于管理)

一旦完成所有这些,如果您使用 scipy.signal 的 find_peaks,它应该会为您提供峰值幅度,您应该能够从中找到时间戳。

这对我有用,因为峰值幅度不是一个一致的幅度。希望对您有所帮助!

于 2019-09-16T18:14:27.913 回答