python - 使用 pygame 和 pyaudio 实现独立流的精确音视频同步

Question

我正在建立一个实验，其中说话人的视频在屏幕内随机移动。用户/播放器应跟随视频并专心聆听语音，同时还播放并发说话者的干扰音轨。

为了更轻松地操作视频位置和音频播放功能，我选择将视频和音频（都属于同一个原始 .mp4）作为独立的流播放。因此，我的主要要求是视频和音频流之间的精确同步，理想情况下是口型同步精度——模拟完美同步的视频播客。请注意，我需要在整个播放时间（最多 10-15 分钟）内保持同步。

在一些文档之后，我认为pyaudio和pygame提供了一个很好的框架来实现这一点。

因此，我继续从 mp4 中提取音频流，并通过回调模式下的 pyaudio 模块在 2 个单独的通道上与干扰扬声器的音频一起播放（采样频率为48kHz，缓冲区大小为256个样本）。将在新线程Pyaudio中播放流：

import pyaudio

class AudioPlayer(): 
    def __init__(self, cond_params)
        self.pa = pyaudio.PyAudio()
        self.audio_stream = self.pa.open(format=pyaudio.paFloat32,
                                        channels=2,
                                        rate=48000,
                                        start = False,
                                        output=True,
                                        stream_callback=self.callback,
                                        frames_per_buffer=256) playback_stream.start_stream()
    def play(self):      
         self.start_playback_pygame = pygame.time.get_ticks()/1000 # start of playback in pygame units (seconds)
         self.start_playback_audio = self.audio_stream.get_time()/1000 # start of playback in pyaudio units (seconds)
         self.audio_stream.start_stream()

         return self.start_playback_pygame
    
    def callback(self):
        [...code ommited...]

对于视频播放，我使用pygame循环通过open-cv模块（cv2）读取新的视频帧，计算新的随机位置并在每次游戏循环更新迭代时将其传送到屏幕上。

import cv2, pygame

class VideoPlayer:
    def __init__(self, screen, cond_params):
        self.video_capture = cv2.VideoCapture('path_to_video')
        self.fps = self.video_capture.get(cv2.CAP_PROP_FPS)
        self.video_pos = (0,0) # initial position
        self.running = 0
        self.update_count = 0
        self.screen = screen

    def update(self, start_playback_time):
        """
        Update with the next video frame at a random new location. Called in every game frame, this will act like a loop, always reading the next video frame.
        """
        empty_screen(self.screen)
        self.update_count += 1
            
        # read the next video frame
        self.running, self.curr_frame = self.video_capture.read()
        if self.running:
            self.curr_frame = cv2.resize(self.curr_frame, self.video_dims)
            frame_to_blit = pygame.image.frombuffer(self.curr_frame.tobytes(), self.video_dims, "BGR")
            self.frame_count += 1
            self.compute_new_pos() # code ommited

            # update screen size if necessary
            self.screen_dims = self.screen.get_size() 
            self.screen.blit(frame_to_blit, self.video_pos)

Media 类同时处理这AudioPlayer()两者VideoPlayer()。一旦按下屏幕上的按钮（代码省略），实验条件开始，即同时播放音频和视频。每次游戏循环迭代，update()都会在Media类中调用一个函数来加载下一个视频帧。这也是我通过计算音频流（由不同线程处理）和视频流（由当前线程处理）之间的延迟来处理同步的地方。pyAudiopygame

class Media:
    def __init__(self, cond_params, clock):
        """
        initialise the audio and video stimuli
        """

        self.clock = clock
        self.start_playback_time = None # start time measured by pygame clock once the AudioPlayer started the playback (in seconds)
        self.cond_params = cond_params
        self.AudioPlayer = AudioPlayer(self.cond_params)
        self.VideoPlayer = VideoPlayer(self.screen, cond_params) # init the display for the current condition
        self.delay_AV = None # keep track of the delay between video and audio streams
        self.screen = pygame.display.set_mode((700, 700), pygame.RESIZABLE, 16)
 
   def play(self, screen):
        """
        Start the condition. The audio and media are started simultaneously
        """
        # Start the auditory stimulus
        self.start_playback_time = self.AudioPlayer.play() # return start_play_back_time in pygame units
        return self.start_playback_time
 
   def update(self, events):
          """
          Update the GUI and activate or stop any processes if necessary
          :param events: contains all user inputs (clicks, key presses...)
          """

         self.update_count += 1
         if self.media.AudioPlayer.playback_active():
           self.media.VideoPlayer.update()

           """
            FOR SYNCHRONIZATION: query the current time for both the audio stream (pyaudio) and video stream (handled in pygame)
           """
           curr_time_audio = self.media.get_audio_stream_time()
           curr_time_pygame = pygame.time.get_ticks()/1000 - self.start_playback_time        
           self.delay_AV = curr_time_pygame - curr_time_audio

           # Update the display to show the new video frame at the correct location
           pygame.display.update()

          if self.delay_AV is not None: 
              if self.delay_AV < 0:
                  print("------ The Video stream is currently BEHIND (comes AFTER) the Audio stream by %.7f msec. Adjusting FPS... " %(self.delay_AV), flush = True)
              else:
                  print("------ The Video stream is currently AHEAD(comes BEFORE) the Audio stream by %.7f msec. Adjusting FPS... " %(self.delay_AV), flush = True)
              return self.delay_AV
          else:
              # no delay applied
              return int(0)

  def get_audio_stream_time(self):
      return self.AudioPlayer.audio_stream.get_time() - self.AudioPlayer.start_playback_audio

在主游戏循环中，我尝试通过调整 pygame 循环的 fps来补偿AV（视听）延迟，方法是使用：clock.tick_busy_loop()

import pygame, sys, Media

cond_params = json.load("path_to_params_file");
clock = pygame.time.Clock()
media = Media(cond_params, clock)
fps = 30 # matches the fps of the video file

# The main loop
running = 1 # True
while running: 
    # Check for user inputs
    events = pygame.event.get()
    for event in events:
        # Check whether the user quit the program
        if event.type == pygame.QUIT:
           running = 0 # False
           pygame.quit()
           sys.exit()

           # Check whether the user wishes to toggle the full-screen mode
           elif event.type == pygame.KEYDOWN:
               if event.key == pygame.K_ESCAPE:
                   pygame.display.toggle_fullscreen()
        
    # Update the media (mainly the video frame at new location, audio plays in a separate thread)
    delay_AV = media.update(events) 
    clock.tick_busy_loop( 1/(1/fps + delay_AV) )  # compensate for audio-video delay

不幸的是，这段代码没有达到我需要的精确同步。所做的是，在播放大约 20-30 秒后，视频流开始加速，与音频不同步。也许我对如何补偿主游戏循环中的延迟的理解是不正确的。

我还尝试了另一种方法，即保持 fps 恒定pygame.display.update()（30 fps，与视频的 fps 匹配），同时在类内调用之前补偿延迟Media，以便在正确的时间精确显示下一个视频帧。我在里面使用的代码Media.update()是：

    [...]
    if self.delay_AV>0:
        """ 
        can only compensate the delay if current time of the video thread is past the current time of audio thread, 
in which case the current pygame thread waits some time until the the audio thread catches up with the current time of the pygame-video thread
        """
        delay_to_apply = int(round(np.abs(self.delay_AV))) 
        t1 = compute_current_time(self.start_playback_time, "msec")
        actual_pygame_delay_ms = pygame.time.delay(delay_to_apply)
        """
         Note: I had to round the delay to the nearest integer because pygame.time.delay() can only receive integers at input. Unfortunately some precision is lost
        """ 

        # Update display only after the "catch-up time" has passed
        pygame.display.update()

第二种方法似乎也不起作用 - 在 20-30 秒的良好同步播放后，AV 流开始分流，视频逐渐比音频快（几毫秒），使得不同步变得明显。

由于我已经花了两个多星期的时间来调试这个，我期待一些外部的启示。非常感谢！

最后说明：我正在Windows 10使用Python 3.10.1.

python - 使用 pygame 和 pyaudio 实现独立流的精确音视频同步

0 回答 0

Related

Reference