4

I would like to perform face detection / tracking on a video file (e.g. an MP4 from the users gallery) using the Android Vision FaceDetector API. I can see many examples on using the CameraSource class to perform face tracking on the stream coming directly from the camera (e.g. on the android-vision github), but nothing on video files.

I tried looking at the source code for CameraSource through Android Studio, but it is obfuscated, and I couldn't see the original online. I image there are many commonalities between using the camera and using a file. Presumably I just play the video file on a Surface, and then pass that to a pipeline.

Alternatively I can see that Frame.Builder has functions setImageData and setTimestampMillis. If I was able to read in the video as ByteBuffer, how would I pass that to the FaceDetector API? I guess this question is similar, but no answers. Similarly, decode the video into Bitmap frames and pass that to setBitmap.

Ideally I don't want to render the video to the screen, and the processing should happen as fast as the FaceDetector API is capable of.

4

2 回答 2

2

或者,我可以看到 Frame.Builder 具有 setImageData 和 setTimestampMillis 函数。如果我能够以 ByteBuffer 的形式读取视频,我将如何将其传递给 FaceDetector API?

只需调用SparseArray<Face> faces = detector.detect(frame);wheredetector必须像这样创建:

FaceDetector detector = new FaceDetector.Builder(context)
   .setProminentFaceOnly(true)
   .build();
于 2016-08-18T13:56:06.677 回答
1

如果处理时间不是问题,使用MediaMetadataRetriever.getFrameAtTime可以解决问题。正如安东建议的那样,您还可以使用FaceDetector.detect

Bitmap bitmap;
Frame frame;
SparseArray<Face> faces;
MediaMetadataRetriever mMMR = new MediaMetadataRetriever();
mMMR.setDataSource(videoPath);
String timeMs = mMMR.extractMetadata(MediaMetadataRetriever.METADATA_KEY_DURATION); // video time in ms
int totalVideoTime= 1000*Integer.valueOf(timeMs); // total video time, in uS
for (int time_us=1;time_us<totalVideoTime;time_us+=deltaT){
        bitmap = mMMR.getFrameAtTime(time_us, MediaMetadataRetriever.OPTION_CLOSEST_SYNC); // extract a bitmap element from the closest key frame from the specified time_us
        if (bitmap==null) break; 
        frame = new Frame.Builder().setBitmap(bitmap).build(); // generates a "Frame" object, which can be fed to a face detector
        faces = detector.detect(frame); // detect the faces (detector is a FaceDetector)
        // TODO ... do something with "faces"
    }

其中deltaT=1000000/fpsfps是每秒所需的帧数。例如,如果你想每秒提取 4 帧,deltaT=250000 (注意faces每次迭代都会被覆盖,所以你应该在循环中做一些事情(存储/报告结果)

于 2017-04-04T19:52:52.073 回答