我正在尝试使用 Google Speech-to-Text API 进行一些语音到语音的翻译(也使用翻译和文本到语音)。我想让一个人对着麦克风讲话,然后将该文本转录为文本。我使用谷歌文档中的流式音频教程作为此方法的基础。当此人停止讲话时,我还希望音频流停止。
这是修改后的方法:
public static String streamingMicRecognize(String language) throws Exception {
ResponseObserver<StreamingRecognizeResponse> responseObserver = null;
try (SpeechClient client = SpeechClient.create()) {
responseObserver =
new ResponseObserver<StreamingRecognizeResponse>() {
ArrayList<StreamingRecognizeResponse> responses = new ArrayList<>();
public void onStart(StreamController controller) {}
public void onResponse(StreamingRecognizeResponse response) {
responses.add(response);
}
public void onComplete() {
SPEECH_TO_TEXT_ANSWER = "";
for (StreamingRecognizeResponse response : responses) {
StreamingRecognitionResult result = response.getResultsList().get(0);
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
SPEECH_TO_TEXT_ANSWER = SPEECH_TO_TEXT_ANSWER + alternative.getTranscript();
}
}
public void onError(Throwable t) {
System.out.println(t);
}
};
ClientStream<StreamingRecognizeRequest> clientStream =
client.streamingRecognizeCallable().splitCall(responseObserver);
RecognitionConfig recognitionConfig =
RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setLanguageCode(language)
.setSampleRateHertz(16000)
.build();
StreamingRecognitionConfig streamingRecognitionConfig =
StreamingRecognitionConfig.newBuilder().setConfig(recognitionConfig).build();
StreamingRecognizeRequest request =
StreamingRecognizeRequest.newBuilder()
.setStreamingConfig(streamingRecognitionConfig)
.build(); // The first request in a streaming call has to be a config
clientStream.send(request);
// SampleRate:16000Hz, SampleSizeInBits: 16, Number of channels: 1, Signed: true,
// bigEndian: false
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
DataLine.Info targetInfo =
new Info(
TargetDataLine.class,
audioFormat); // Set the system information to read from the microphone audio stream
if (!AudioSystem.isLineSupported(targetInfo)) {
System.out.println("Microphone not supported");
System.exit(0);
}
// Target data line captures the audio stream the microphone produces.
TargetDataLine targetDataLine = (TargetDataLine) AudioSystem.getLine(targetInfo);
targetDataLine.open(audioFormat);
targetDataLine.start();
System.out.println("Start speaking");
playMP3("beep-07.mp3");
long startTime = System.currentTimeMillis();
// Audio Input Stream
AudioInputStream audio = new AudioInputStream(targetDataLine);
long estimatedTime = 0, estimatedTimeStoppedSpeaking = 0, startStopSpeaking = 0;
int currentSoundLevel = 0;
Boolean hasSpoken = false;
while (true) {
estimatedTime = System.currentTimeMillis() - startTime;
byte[] data = new byte[6400];
audio.read(data);
currentSoundLevel = calculateRMSLevel(data);
System.out.println(currentSoundLevel);
if (currentSoundLevel > 20) {
estimatedTimeStoppedSpeaking = 0;
startStopSpeaking = 0;
hasSpoken = true;
}
else {
if (startStopSpeaking == 0) {
startStopSpeaking = System.currentTimeMillis();
}
estimatedTimeStoppedSpeaking = System.currentTimeMillis() - startStopSpeaking;
}
if ((estimatedTime > 15000) || (estimatedTimeStoppedSpeaking > 1000 && hasSpoken)) { // 15 seconds or stopped speaking for 1 second
playMP3("beep-07.mp3");
System.out.println("Stop speaking.");
targetDataLine.stop();
targetDataLine.drain();
targetDataLine.close();
break;
}
request =
StreamingRecognizeRequest.newBuilder()
.setAudioContent(ByteString.copyFrom(data))
.build();
clientStream.send(request);
}
} catch (Exception e) {
System.out.println(e);
}
responseObserver.onComplete();
String ans = SPEECH_TO_TEXT_ANSWER;
return ans;
}
输出应该是字符串形式的转录文本。但是,它非常不一致。大多数时候,它返回一个空字符串。但是,有时该程序确实有效并且确实返回了转录文本。
我还尝试在程序运行时单独录制音频。虽然该方法返回一个空字符串,但当我将单独录制的音频文件保存并直接通过 api 发送时,它返回了正确的转录文本。
我不明白程序为什么/如何只在某些时候工作。