我正在使用语音转文本应用程序,它提供转录文件作为输出。转录的文本包含一些标签,如(s)(用于句子开头).. (/s)(用于句子结尾).. (VOCAL_NOISE)(用于无法识别的单词).. 但是文本还包含不需要的标签,如(VOCAL_N), (VOCAL_NOISED), (VOCAL_SOUND), (UNKNOWN).. 我正在使用 SED 处理文本.. 但无法编写适当的正则表达式来替换除(s),(/s)和(VOCAL_NOISE), 之外的所有其他标签~NS.. 如果有人可以帮助我,我将不胜感激它..
示例文本:
(s) Hi Stacey , this is Stanley (/s) (s) I would (VOCAL_N) appreciate if you could call (UNKNOWN) and let him know I want an appointment (VOCAL_NOISE) with him (/s)
输出应该是:
(s) Hi Stacey , this is Stanley (/s) (s) I would ~NS appreciate if you could call ~NS and let him know I want an appointment (VOCAL_NOISE) with him (/s)