android - Android Face Detection API - Stored video file

Question

I would like to perform face detection / tracking on a video file (e.g. an MP4 from the users gallery) using the Android Vision FaceDetector API. I can see many examples on using the CameraSource class to perform face tracking on the stream coming directly from the camera (e.g. on the android-vision github), but nothing on video files.

I tried looking at the source code for CameraSource through Android Studio, but it is obfuscated, and I couldn't see the original online. I image there are many commonalities between using the camera and using a file. Presumably I just play the video file on a Surface, and then pass that to a pipeline.

Alternatively I can see that Frame.Builder has functions setImageData and setTimestampMillis. If I was able to read in the video as ByteBuffer, how would I pass that to the FaceDetector API? I guess this question is similar, but no answers. Similarly, decode the video into Bitmap frames and pass that to setBitmap.

Ideally I don't want to render the video to the screen, and the processing should happen as fast as the FaceDetector API is capable of.

score 2 · Accepted Answer

或者，我可以看到 Frame.Builder 具有 setImageData 和 setTimestampMillis 函数。如果我能够以 ByteBuffer 的形式读取视频，我将如何将其传递给 FaceDetector API？

只需调用SparseArray<Face> faces = detector.detect(frame);wheredetector必须像这样创建：

FaceDetector detector = new FaceDetector.Builder(context)
   .setProminentFaceOnly(true)
   .build();

score 1 · Accepted Answer

如果处理时间不是问题，使用MediaMetadataRetriever.getFrameAtTime可以解决问题。正如安东建议的那样，您还可以使用FaceDetector.detect：

Bitmap bitmap;
Frame frame;
SparseArray<Face> faces;
MediaMetadataRetriever mMMR = new MediaMetadataRetriever();
mMMR.setDataSource(videoPath);
String timeMs = mMMR.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION); // video time in ms
int totalVideoTime= 1000*Integer.valueOf(timeMs); // total video time, in uS
for (int time_us=1;time_us<totalVideoTime;time_us+=deltaT){
        bitmap = mMMR.getFrameAtTime(time_us, MediaMetadataRetriever.OPTION_CLOSEST_SYNC); // extract a bitmap element from the closest key frame from the specified time_us
        if (bitmap==null) break; 
        frame = new Frame.Builder().setBitmap(bitmap).build(); // generates a "Frame" object, which can be fed to a face detector
        faces = detector.detect(frame); // detect the faces (detector is a FaceDetector)
        // TODO ... do something with "faces"
    }

其中deltaT=1000000/fps和fps是每秒所需的帧数。例如，如果你想每秒提取 4 帧，deltaT=250000 （注意faces每次迭代都会被覆盖，所以你应该在循环中做一些事情（存储/报告结果）

android - Android Face Detection API - Stored video file

2 回答 2

Related

Reference