我一直在开发一些流媒体软件,这些软件使用 H.264 从各种摄像机和网络上获取实时信息。为了实现这一点,我直接使用 x264 编码器(使用“zerolatency”预设)并提供 NAL,因为它们可用于 libavformat 以打包到 RTP(最终是 RTSP)中。理想情况下,此应用程序应尽可能实时。在大多数情况下,这一直运作良好。

然而不幸的是,存在某种同步问题:客户端上的任何视频播放似乎都会显示一些流畅的帧,然后是短暂的停顿,然后是更多的帧;重复。此外,似乎有大约 4 秒的延迟。我尝试过的每个视频播放器都会发生这种情况:Totem、VLC 和基本的 gstreamer 管道。


#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <x264.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>

#define WIDTH       640
#define HEIGHT      480
#define FPS         30
#define BITRATE     400000
#define RTP_ADDRESS ""
#define RTP_PORT    49990

struct AVFormatContext* avctx;
struct x264_t* encoder;
struct SwsContext* imgctx;

uint8_t test = 0x80;

void create_sample_picture(x264_picture_t* picture)
    // create a frame to store in
    x264_picture_alloc(picture, X264_CSP_I420, WIDTH, HEIGHT);

    // fake image generation
    // disregard how wrong this is; just writing a quick test
    int strides = WIDTH / 8;
    uint8_t* data = malloc(WIDTH * HEIGHT * 3);
    memset(data, test, WIDTH * HEIGHT * 3);
    test = (test << 1) | (test >> (8 - 1));

    // scale the image
    sws_scale(imgctx, (const uint8_t* const*) &data, &strides, 0, HEIGHT,
              picture->img.plane, picture->img.i_stride);

int encode_frame(x264_picture_t* picture, x264_nal_t** nals)
    // encode a frame
    x264_picture_t pic_out;
    int num_nals;
    int frame_size = x264_encoder_encode(encoder, nals, &num_nals, picture, &pic_out);

    // ignore bad frames
    if (frame_size < 0)
        return frame_size;

    return num_nals;

void stream_frame(uint8_t* payload, int size)
    // initalize a packet
    AVPacket p;
    p.data = payload;
    p.size = size;
    p.stream_index = 0;
    p.flags = AV_PKT_FLAG_KEY;
    p.pts = AV_NOPTS_VALUE;
    p.dts = AV_NOPTS_VALUE;

    // send it out
    av_interleaved_write_frame(avctx, &p);

int main(int argc, char* argv[])
    // initalize ffmpeg

    // set up image scaler
    // (in-width, in-height, in-format, out-width, out-height, out-format, scaling-method, 0, 0, 0)
    imgctx = sws_getContext(WIDTH, HEIGHT, PIX_FMT_MONOWHITE,
                            WIDTH, HEIGHT, PIX_FMT_YUV420P,
                            SWS_FAST_BILINEAR, NULL, NULL, NULL);

    // set up encoder presets
    x264_param_t param;
    x264_param_default_preset(&param, "ultrafast", "zerolatency");

    param.i_threads = 3;
    param.i_width = WIDTH;
    param.i_height = HEIGHT;
    param.i_fps_num = FPS;
    param.i_fps_den = 1;
    param.i_keyint_max = FPS;
    param.b_intra_refresh = 0;
    param.rc.i_bitrate = BITRATE;
    param.b_repeat_headers = 1; // whether to repeat headers or write just once
    param.b_annexb = 1;         // place start codes (1) or sizes (0)

    // initalize
    x264_param_apply_profile(&param, "high");
    encoder = x264_encoder_open(&param);

    // at this point, x264_encoder_headers can be used, but it has had no effect

    // set up streaming context. a lot of error handling has been ommitted
    // for brevity, but this should be pretty standard.
    avctx = avformat_alloc_context();
    struct AVOutputFormat* fmt = av_guess_format("rtp", NULL, NULL);
    avctx->oformat = fmt;

    snprintf(avctx->filename, sizeof(avctx->filename), "rtp://%s:%d", RTP_ADDRESS, RTP_PORT);
    if (url_fopen(&avctx->pb, avctx->filename, URL_WRONLY) < 0)
        perror("url_fopen failed");
        return 1;
    struct AVStream* stream = av_new_stream(avctx, 1);

    // initalize codec
    AVCodecContext* c = stream->codec;
    c->codec_id = CODEC_ID_H264;
    c->codec_type = AVMEDIA_TYPE_VIDEO;
    c->width = WIDTH;
    c->height = HEIGHT;
    c->time_base.den = FPS;
    c->time_base.num = 1;
    c->gop_size = FPS;
    c->bit_rate = BITRATE;
    avctx->flags = AVFMT_FLAG_RTP_HINT;

    // write the header

    // make some frames
    for (int frame = 0; frame < 10000; frame++)
        // create a sample moving frame
        x264_picture_t* pic = (x264_picture_t*) malloc(sizeof(x264_picture_t));

        // encode the frame
        x264_nal_t* nals;
        int num_nals = encode_frame(pic, &nals);

        if (num_nals < 0)
            printf("invalid frame size: %d\n", num_nals);

        // send out NALs
        for (int i = 0; i < num_nals; i++)
            stream_frame(nals[i].p_payload, nals[i].i_payload);

        // free up resources

        // stream at approx 30 fps
        printf("frame %d\n", frame);

    return 0;

此测试显示白色背景上的黑色线条应该平滑地向左移动。它是为 ffmpeg 0.6.5 编写的,但该问题可以在0.80.10上重现(从我目前测试的结果来看)。我在错误处理方面采取了一些捷径,以使这个示例尽可能短,同时仍然显示问题,所以请原谅一些讨厌的代码。我还应该注意,虽然这里没有使用 SDP,但我已经尝试使用它并获得类似的结果。测试可以编译:

gcc -g -std=gnu99 streamtest.c -lswscale -lavformat -lx264 -lm -lpthread -o streamtest


gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink

您应该立即注意到口吃。我在 Internet 上看到的一个常见“修复”是将 sync=false 添加到管道中:

gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink sync=false

这会导致播放流畅(并且接近实时),但不是解决方案,仅适用于 gstreamer。我想从源头上解决问题。我已经能够使用原始 ffmpeg 以几乎相同的参数进行流式传输,并且没有任何问题:

ffmpeg -re -i sample.mp4 -vcodec libx264 -vpre ultrafast -vpre baseline -b 400000 -an -f rtp rtp:// -an



1 回答 1


1)您没有为发送到 libx264 的帧设置 PTS(您可能应该看到“非严格单调 PTS”警告) 2)您没有为发送到 libavformat 的 rtp muxer 的数据包设置 PTS/DTS(我没有100%肯定需要设置,但我想它会更好。从源代码看起来像 rtp 使用 PTS)。3)恕我直言,usleep(33333)很糟糕。它也会导致编码器在这段时间内停止(增加延迟),而您可以在这段时间内编码下一帧,即使您仍然不需要通过 rtp 发送它。

PS顺便说一句,您没有将 param.rc.i_rc_method 设置为 X264_RC_ABR,因此 libx264 将使用 CRF 23 而忽略您的“param.rc.i_bitrate = BITRATE”。在为网络发送进行编码时使用 VBV 也是一个好主意。

于 2012-07-31T12:07:19.097 回答