c# - Decoding DTMF from a WAV file

Question

Following on from my earlier question, my goal is to detect DTMF tones in a WAV file from C#. However, I'm really struggling to understand how this can be done.

I understand the DTMF uses a combination of frequencies, and a Goertzel algorithm can be used ... somehow. I've grabbed a Goertzel code snippet and I've tried shoving a .WAV file into it (using NAudio to read the file, which is a 8KHz mono 16-bit PCM WAV):

 using (WaveFileReader reader = new WaveFileReader(@"dtmftest_w.wav"))
  {
      byte[] buffer = new byte[reader.Length];

      int read = reader.Read(buffer, 0, buffer.Length);
      short[] sampleBuffer = new short[read/2];
      Buffer.BlockCopy(buffer, 0, sampleBuffer, 0, read/2);
      Console.WriteLine(CalculateGoertzel(sampleBuffer,8000,16));                 
   }

 public static double CalculateGoertzel(short[] sample, double frequency, int samplerate)
   {
      double Skn, Skn1, Skn2;
      Skn = Skn1 = Skn2 = 0;
      for (int i = 0; i < sample.Length; i++)
         {
            Skn2 = Skn1;
            Skn1 = Skn;
            Skn = 2 * Math.Cos(2 * Math.PI * frequency / samplerate) * Skn1 - Skn2 + sample[i];
         }
      double WNk = Math.Exp(-2 * Math.PI * frequency / samplerate);
      return 20 * Math.Log10(Math.Abs((Skn - WNk * Skn1)));
    }

I know what I'm doing is wrong: I assume that I should iterate through the buffer, and only calculate the Goertzel value for a small chunk at a time - is this correct?

Secondly, I don't really understand what the output of the Goertzel method is telling me: I get a double (example: 210.985812) returned, but I don't know to equate that to the presence and value of a DTMF tone in the audio file.

I've searched everywhere for an answer, including the libraries referenced in this answer; unfortunately, the code here doesn't appear to work (as noted in the comments on the site). There is a commercial library offered by TAPIEx; I've tried their evaluation library and it does exactly what I need - but they're not responding to emails, which makes me wary about actually purchasing their product.

I'm very conscious that I'm looking for an answer when perhaps I don't know the exact question, but ultimately all I need is a way to find DTMF tones in a .WAV file. Am I on the right lines, and if not, can anyone point me in the right direction?

EDIT: Using @Abbondanza 's code as a basis, and on the (probably fundamentally wrong) assumption that I need to drip-feed small sections of the audio file in, I now have this (very rough, proof-of-concept only) code:

const short sampleSize = 160;

using (WaveFileReader reader = new WaveFileReader(@"\\mac\home\dtmftest.wav"))
        {           
            byte[] buffer = new byte[reader.Length];

            reader.Read(buffer, 0, buffer.Length);

            int bufferPos = 0;

            while (bufferPos < buffer.Length-(sampleSize*2))
            {
                short[] sampleBuffer = new short[sampleSize];
                Buffer.BlockCopy(buffer, bufferPos, sampleBuffer, 0, sampleSize*2);


                var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};

                var powers = frequencies.Select(f => new
                {
                    Frequency = f,
                   Power = CalculateGoertzel(sampleBuffer, f, 8000)              
                });

                const double AdjustmentFactor = 1.05;
                var adjustedMeanPower = AdjustmentFactor*powers.Average(result => result.Power);

                var sortedPowers = powers.OrderByDescending(result => result.Power);
                var highestPowers = sortedPowers.Take(2).ToList();

                float seconds = bufferPos / (float)16000;

                if (highestPowers.All(result => result.Power > adjustedMeanPower))
                {
                    // Use highestPowers[0].Frequency and highestPowers[1].Frequency to 
                    // classify the detected DTMF tone.

                    switch (Convert.ToInt32(highestPowers[0].Frequency))
                    {
                        case 1209:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("1 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("4 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("7 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("* pressed at " + bufferPos);
                                    break;
                            }
                            break;
                        case 1336:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("2 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("5 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("8 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("0 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                            }
                            break;
                        case 1477:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("3 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("6 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("9 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("# pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                            }
                            break;
                    }
                }
                else
                {
                    Console.WriteLine("No DTMF at " + bufferPos + " (" + seconds + "s)");
                }
                bufferPos = bufferPos + (sampleSize*2);
            }

This is the sample file as viewed in Audacity; I've added in the DTMF keypresses that were pressed-

and ... it almost works. From the file above, I shouldn't see any DTMF until almost exactly 3 seconds in, however, my code reports:

9 pressed at 1920 (0.12s)
1 pressed at 2880 (0.18s)
* pressed at 3200
1 pressed at 5120 (0.32s)
1 pressed at 5440 (0.34s)
7 pressed at 5760 (0.36s)
7 pressed at 6080 (0.38s)
7 pressed at 6720 (0.42s)
5 pressed at 7040 (0.44s)
7 pressed at 7360 (0.46s)
7 pressed at 7680 (0.48s)
1 pressed at 8000 (0.5s)
7 pressed at 8320 (0.52s)

... until it gets to 3 seconds, and THEN it starts to settle down to the correct answer: that 1 was pressed:

7 pressed at 40000 (2.5s)
# pressed at 43840 (2.74s)
No DTMF at 44800 (2.8s)
1 pressed at 45120 (2.82s)
1 pressed at 45440 (2.84s)
1 pressed at 46080 (2.88s)
1 pressed at 46720 (2.92s)
4 pressed at 47040 (2.94s)
1 pressed at 47360 (2.96s)
1 pressed at 47680 (2.98s)
1 pressed at 48000 (3s)
1 pressed at 48960 (3.06s)
4 pressed at 49600 (3.1s)
1 pressed at 49920 (3.12s)
1 pressed at 50560 (3.16s)
1 pressed at 51520 (3.22s)
1 pressed at 52160 (3.26s)
4 pressed at 52480 (3.28s)

If I bump up the AdjustmentFactor beyond 1.2, I get very little detection at all.

I sense that I'm almost there, but can anyone see what it is I'm missing?

EDIT2: The test file above is available here. The adjustedMeanPower in the example above is 47.6660450354638, and the powers are:

score 8 · Accepted Answer

CalculateGoertzel()返回提供的样本内所选频率的功率。

计算每个 DTMF 频率（697、770、852、941、1209、1336 和 1477 Hz）的功率，对结果功率进行排序并选择最高的两个。如果两者都高于某个阈值，则检测到 DTMF 音。

您使用的阈值取决于样本的信噪比 (SNR)。首先，计算所有 Goerzel 值的平均值就足够了，将平均值乘以一个因子（例如 2 或 3），然后检查两个最高 Goerzel 值是否高于该值。

这是一个代码片段，以更正式的方式表达我的意思：

var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};

var powers = frequencies.Select(f => new
{
    Frequency = f,
    Power = CalculateGoerzel(sample, f, samplerate)
});

const double AdjustmentFactor = 1.0;
var adjustedMeanPower = AdjustmentFactor * powers.Average(result => result.Power);

var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();

if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
    // Use highestPowers[0].Frequency and highestPowers[1].Frequency to 
    // classify the detected DTMF tone.
}

AdjustmentFactor以of开头1.0。如果您从测试数据中得到误报（即您在不应该有任何 DTMF 音调的样本中检测到 DTMF 音调），请继续增加它直到误报停止。

更新#1

我在波形文件上尝试了你的代码并调整了一些东西：

我在 Goertzel 计算后实现了可枚举（对性能很重要）：

var powers = frequencies.Select(f => new
{
    Frequency = f,
    Power = CalculateGoertzel(sampleBuffer, f, 8000)
// Materialize enumerable to avoid multiple calculations.
}).ToList();

我没有使用调整后的平均值进行阈值处理。我只是用作100.0阈值：

if (highestPowers.All(result => result.Power > 100.0))
{
     ...
}

我将样本量加倍（我相信你使用过160）：

int sampleSize = 160 * 2;

我修复了您的 DTMF 分类。我使用嵌套字典来捕获所有可能的情况：

var phoneKeyOf = new Dictionary<int, Dictionary<int, string>>
{
    {1209, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "*"}, {852, "7"}, {770, "4"}, {697, "1"}}},
    {1336, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "0"}, {852, "8"}, {770, "5"}, {697, "2"}}},
    {1477, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "#"}, {852, "9"}, {770, "6"}, {697, "3"}}},
    { 941, new Dictionary<int, string> {{1477, "#"}, {1336, "0"}, {1209, "*"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 852, new Dictionary<int, string> {{1477, "9"}, {1336, "8"}, {1209, "7"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 770, new Dictionary<int, string> {{1477, "6"}, {1336, "5"}, {1209, "4"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 697, new Dictionary<int, string> {{1477, "3"}, {1336, "2"}, {1209, "1"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}}
}

然后通过以下方式检索电话密钥：

var key = phoneKeyOf[(int)highestPowers[0].Frequency][(int)highestPowers[1].Frequency];

结果并不完美，但有些可靠。

更新#2

我想我找到了问题所在，但现在无法亲自尝试。您不能将目标频率直接传递给CalculateGoertzel(). 它必须被归一化以位于 DFT 箱的中心。在计算功率时，请尝试以下方法：

var powers = frequencies.Select(f => new
{
    Frequency = f,
    // Pass normalized frequenzy
    Power = CalculateGoertzel(sampleBuffer, Math.Round(f*sampleSize/8000.0), 8000)
}).ToList();

此外，您必须使用205assampleSize以最大限度地减少错误。

更新#3

我重写了原型以使用 NAudio 的ISampleProvider接口，它返回标准化的样本值（float范围 [-1.0; 1.0] 中的 s）。CalculateGoertzel()我也从头开始重新编写。它仍然没有优化性能，但在频率之间提供了非常明显的功率差异。当我运行您的测试数据时，不再有误报。我强烈建议你看看它： http: //pastebin.com/serxw5nG

更新#4

我创建了一个GitHub 项目和两个 NuGet 包来检测实时（捕获）音频和预先录制的音频文件中的 DTMF 音调。

c# - Decoding DTMF from a WAV file

1 回答 1

Related

Reference