1

我正在尝试编写一个python代码,其中用户输入一个视频文件,程序将静音/哔哔其中的诅咒/坏词并输出一个过滤的视频文件,基本上是一个亵渎过滤器。首先,我将视频文件转换为 .wav 格式,然后尝试将音频亵渎过滤器应用到 .wav 文件中,并将该 .wav 文件写入视频。

到目前为止,我能够制作音频文件块并使用 Speech_recognition 库从每个 5 秒的音频块中提取文本。但是,如果单词在块之间重叠,我将无法检测到它们并应用检查条件,如果在诅咒单词列表中找到该块文本并将减少该音频文件的 db,这将使其静音(建议其他方式发出哔声而不是静音)。

如果我的方法是正确的,我很困惑。我只想要一个使用 python 的音频亵渎过滤器,直到现在我只能提取文本并制作块。

import speech_recognition as sr 
import os 
from pydub.utils import make_chunks
from pydub import AudioSegment
from pydub.playback import play
import codecs
import re
import fileinput
list=[]
#Curse words list
with codecs.open("words_final.txt", "r") as f0:
    sentences_lines=f0.read().splitlines()
    for sentences in sentences_lines:
        list.append(sentences)
# print(list)

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    chunk_length_ms = 5000 # pydub calculates in millisec
    chunks = make_chunks(sound, chunk_length_ms) #Make chunks of one sec
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened,language="en-US")
                wav_file=AudioSegment.from_file(chunk_filename, format = "wav")
                # Reducing volume by 5
                silent_wav_file = AudioSegment.silent(duration=8000)
                #  Playing silent file
                play(silent_wav_file)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

path = "Welcome.wav"
print("\nFull text:", get_large_audio_transcription(path))

#Will implement a loop to sum all chunks a make a final filtered .wav file
4

1 回答 1

0

首先从视频文件中提取audio.wav。然后使用此解决方案从该音频中获取每个单词的时间戳。然后,您可以将音频中的坏话静音/发出哔哔声,并将其加入视频中。

于 2021-08-21T13:15:49.263 回答