node.js - Watson NarrowBand Speech to Text 不接受 ogg 文件

Question

NodeJS 应用程序使用 ffmpeg 从 mp3 和 mp4 创建 ogg 文件。如果源文件是宽带文件，Watson Speech to Text 会毫无问题地接受该文件。如果源文件是窄带文件，Watson Speech to Text 将无法读取 ogg 文件。我已经测试了 ffmpeg 的输出，并且窄带 ogg 文件具有与 mp3 文件相同的音频内容（例如，我可以听它并听到相同的人）。是的，我提前更改了对 Watson 的调用以正确指定模型和 content_type。代码如下：

exports.createTranscript = function(req, res, next)
{ var _name = getNameBase(req.body.movie);
  var _type = getType(req.body.movie);
  var _voice = (_type == "mp4") ? "en-US_BroadbandModel" : "en-US_NarrowbandModel" ;
  var _contentType = (_type == "mp4") ? "audio/ogg" : "audio/basic" ;
  var _audio = process.cwd()+"/HTML/movies/"+_name+'ogg';
  var transcriptFile = process.cwd()+"/HTML/movies/"+_name+'json';

  speech_to_text.createSession({model: _voice}, function(error, session) {
    if (error) {console.log('error:', error);}
    else
      {
        var params = { content_type: _contentType, continuous: true,
         audio: fs.createReadStream(_audio),
          session_id: session.session_id
          };
          speech_to_text.recognize(params, function(error, transcript) {
            if (error) {console.log('error:', error);}
            else
              { fs.writeFile(transcriptFile, JSON.stringify(transcript), function(err) {if (err) {console.log(err);}});
                res.send(transcript);
              }
          });
      }
  });
}

_type是 mp3（来自电话录音的窄带）还是 mp4（宽带） model: _voice已跟踪以确保设置正确 content_type: _contentType已跟踪以确保设置正确

使用窄带设置提交到 Speech to Text 的任何 ogg 文件都失败，Error: No speech detected for 30s.并使用真正的窄带文件进行了测试，并要求 Watson 将宽带 ogg 文件（从 mp4 创建）读取为窄带。相同的错误信息。我错过了什么？

score 1 · Accepted Answer

Watson Speech to Text 的文档在这一点上令人困惑。此处的文档表明，在使用窄带模型时，content_type应将其设置为audio/basic. 这是不正确的。在此示例中，入站音频文件是窄带文件，但它是 ogg 文件，因此content_type仍应为audio/ogg. 那个单一的改变就解决了这个问题。

node.js - Watson NarrowBand Speech to Text 不接受 ogg 文件

1 回答 1

Related

Reference