video - What's wrong with my use of timestamps/timebases for frame seeking/reading using libav (ffmpeg)?

Question

So I want to grab a frame from a video at a specific time using libav for the use as a thumbnail.

What I'm using is the following code. It compiles and works fine (in regards to retrieving a picture at all), yet I'm having a hard time getting it to retrieve the right picture.

I simply can't get my head around the all but clear logic behind libav's apparent use of multiple time-bases per video. Specifically figuring out which functions expect/return which type of time-base.

The docs were of basically no help whatsoever, unfortunately. SO to the rescue?

#define ABORT(x) do {fprintf(stderr, x); exit(1);} while(0)

av_register_all();

AVFormatContext *format_context = ...;
AVCodec *codec = ...;
AVStream *stream = ...;
AVCodecContext *codec_context = ...;
int stream_index = ...;

// open codec_context, etc.

AVRational stream_time_base = stream->time_base;
AVRational codec_time_base = codec_context->time_base;

printf("stream_time_base: %d / %d = %.5f\n", stream_time_base.num, stream_time_base.den, av_q2d(stream_time_base));
printf("codec_time_base: %d / %d = %.5f\n\n", codec_time_base.num, codec_time_base.den, av_q2d(codec_time_base));

AVFrame *frame = avcodec_alloc_frame();

printf("duration: %lld @ %d/sec (%.2f sec)\n", format_context->duration, AV_TIME_BASE, (double)format_context->duration / AV_TIME_BASE);
printf("duration: %lld @ %d/sec (stream time base)\n\n", format_context->duration / AV_TIME_BASE * stream_time_base.den, stream_time_base.den);
printf("duration: %lld @ %d/sec (codec time base)\n", format_context->duration / AV_TIME_BASE * codec_time_base.den, codec_time_base.den);

double request_time = 10.0; // 10 seconds. Video's total duration is ~20sec
int64_t request_timestamp = request_time / av_q2d(stream_time_base);
printf("requested: %.2f (sec)\t-> %2lld (pts)\n", request_time, request_timestamp);

av_seek_frame(format_context, stream_index, request_timestamp, 0);

AVPacket packet;
int frame_finished;
do {
    if (av_read_frame(format_context, &packet) < 0) {
        break;
    } else if (packet.stream_index != stream_index) {
        av_free_packet(&packet);
        continue;
    }
    avcodec_decode_video2(codec_context, frame, &frame_finished, &packet);
} while (!frame_finished);

// do something with frame

int64_t received_timestamp = frame->pkt_pts;
double received_time = received_timestamp * av_q2d(stream_time_base);
printf("received:  %.2f (sec)\t-> %2lld (pts)\n\n", received_time, received_timestamp);

Running this with a test movie file I get this output:

    stream_time_base: 1 / 30000 = 0.00003
    codec_time_base: 50 / 2997 = 0.01668

    duration: 20062041 @ 1000000/sec (20.06 sec)
    duration: 600000 @ 30000/sec (stream time base)
    duration: 59940 @ 2997/sec (codec time base)

    requested: 10.00 (sec)  -> 300000 (pts)
    received:  0.07 (sec)   -> 2002 (pts)

The times don't match. What's going on here? What am I doing wrong?

While searching for clues I stumbled upon this this statement from the libav-users mailing list…</p>

[...] packet PTS/DTS are in units of the format context's time_base,
where the AVFrame->pts value is in units of the codec context's time_base.

In other words, the container can have (and usually does) a different time_base than the codec. Most libav players don't bother using the codec's time_base or pts since not all codecs have one, but most containers do. (This is why the dranger tutorial says to ignore AVFrame->pts)

…which confused me even more, given that I couldn't find any such mention in the official docs.

Anyway, I replaced…</p>

double received_time = received_timestamp * av_q2d(stream_time_base);

…with…</p>

double received_time = received_timestamp * av_q2d(codec_time_base);

…and the output changed to this…</p>

...

requested: 10.00 (sec)  -> 300000 (pts)
received:  33.40 (sec)  -> 2002 (pts)

Still no match. What's wrong?

score 18 · Accepted Answer

大多是这样的：

流时基是您真正感兴趣的内容。它是数据包时间戳所在的内容，也是pkt_pts输出帧上的内容（因为它只是从相应的数据包中复制而来）。
编解码器时基（如果已设置）只是可能写入编解码器级标头中的帧速率的倒数。在没有容器计时信息的情况下（例如，当您阅读原始视频时），它可能很有用，但可以安全地忽略。
AVFrame.pkt_pts 是解码到此帧中的数据包的时间戳。如前所述，它只是数据包的直接副本，因此它位于流时基中。这是您要使用的字段（如果容器有时间戳）。
AVFrame.pts 在解码时从未设置为任何有用的东西，忽略它（它可能会pkt_pts在未来取代，以使整个混乱不那么混乱，但现在它是这样的，主要是出于历史原因）。
格式上下文的持续时间是AV_TIME_BASE（即微秒）。它不能在任何流时基中，因为您可以拥有三个数以百计的流，每个流都有自己的时基。
搜索后获得不同时间戳的问题只是搜索不准确。在大多数情况下，您只能寻找最近的关键帧，因此通常会延迟几秒钟。解码和丢弃不需要的帧必须手动完成。

score 10 · Accepted Answer

我换了

av_seek_frame(format_context, stream_index, request_timestamp, 0);

和

avformat_seek_file(format_context, stream_index, INT64_MIN, request_timestamp, INT64_MAX, 0);

突然我得到了合理的输出。伟大的。
在几乎完全的文档黑暗中只花了一天时间。：/

video - What's wrong with my use of timestamps/timebases for frame seeking/reading using libav (ffmpeg)?

2 回答 2

Related

Reference