我正在尝试将来自在线通信应用程序的音频输入 Vosk 语音识别 API。
音频以字节数组的形式出现,并采用这种音频格式PCM_SIGNED 48000.0 Hz, 16 bit, stereo, 4 bytes/frame, big-endian
。为了能够用 Vosk 处理它,它需要是mono
and little-endian
。
这是我目前的尝试:
byte[] audioData = userAudio.getAudioData(1);
short[] convertedAudio = new short[audioData.length / 2];
ByteBuffer buffer = ByteBuffer.allocate(convertedAudio.length * Short.BYTES);
// Convert to mono, I don't think I did it right though
int j = 0;
for (int i = 0; i < audioData.length; i += 2)
convertedAudio[j++] = (short) (audioData[i] << 8 | audioData[i + 1] & 0xFF);
// Convert to little endian
buffer.order(ByteOrder.BIG_ENDIAN);
for (short s : convertedAudio)
buffer.putShort(s);
buffer.order(ByteOrder.LITTLE_ENDIAN);
buffer.rewind();
for (int i = 0; i < convertedAudio.length; i++)
convertedAudio[i] = buffer.getShort();
queue.add(convertedAudio);