tensorflow - tf object detection api - 为每个检测 bbox 提取特征向量

Question

我正在使用 Tensorflow 对象检测 API 并使用预训练的 ssd-mobilenet 模型。有没有办法为每个 bbox 提取移动网络的最后一个全局池作为特征向量？我找不到保存此信息的操作的名称。

我已经能够根据 github 上的示例提取检测标签和 bbox：

 image_tensor = detection_graph.get_tensor_by_name( 'image_tensor:0' )
 # Each box represents a part of the image where a particular object was detected.
 detection_boxes = detection_graph.get_tensor_by_name( 'detection_boxes:0' )
 # Each score represent how level of confidence for each of the objects.
 # Score is shown on the result image, together with the class label.
 detection_scores = detection_graph.get_tensor_by_name( 'detection_scores:0' )
 detection_classes = detection_graph.get_tensor_by_name( 'detection_classes:0' )
 num_detections = detection_graph.get_tensor_by_name( 'num_detections:0' )
 #TODO: add also the feature vector output

 # Actual detection.
 (boxes, scores, classes, num) = sess.run(
                [detection_boxes, detection_scores, detection_classes, num_detections],
                feed_dict={image_tensor: image_np_expanded} )

score 5 · Accepted Answer

正如史蒂夫所说，对象检测 api 中 Faster RCNN 中的特征向量似乎在 SecondStageBoxPredictor 之后被丢弃。我可以通过修改 core/box_predictor.py 和 meta_architectures/faster_rcnn_meta_arch.py 来让它们通过网络。

症结在于非最大抑制代码实际上有一个附加字段的参数（参见master上的core/post_processing.py:176）。您可以传递在前两个维度中与框和分数具有相同形状的张量的字典，并且该函数将返回它们以与框和分数相同的方式过滤。这是与我所做更改的主人的差异：

https://gist.github.com/donniet/c95d19e00ff9abeb786415b3a9348e62

然后我不得不重建网络并从这样的检查点加载变量，而不是加载冻结图（注意：我从这里下载了更快的 rcnn 的检查点：http: //download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28 .tar.gz )

import sys
import os
import numpy as np

from object_detection.builders import model_builder
from object_detection.protos import pipeline_pb2

from google.protobuf import text_format
import tensorflow as tf

# load the pipeline structure from the config file
with open('object_detection/samples/configs/faster_rcnn_resnet101_coco.config', 'r') as content_file:
    content = content_file.read()

# build the model with model_builder
pipeline_proto = pipeline_pb2.TrainEvalPipelineConfig()
text_format.Merge(content, pipeline_proto)
model = model_builder.build(pipeline_proto.model, is_training=False)

# construct a network using the model
image_placeholder = tf.placeholder(shape=(None,None,3), dtype=tf.uint8, name='input')
original_image = tf.expand_dims(image_placeholder, 0)
preprocessed_image, true_image_shapes = model.preprocess(tf.to_float(original_image))
prediction_dict = model.predict(preprocessed_image, true_image_shapes)
detections = model.postprocess(prediction_dict, true_image_shapes)

# create an input network to read a file
filename_placeholder = tf.placeholder(name='file_name', dtype=tf.string)
image_file = tf.read_file(filename_placeholder)
image_data = tf.image.decode_image(image_file)

# load the variables from a checkpoint
init_saver = tf.train.Saver()
sess = tf.Session()
init_saver.restore(sess, 'object_detection/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt')

# get the image data
blob = sess.run(image_data, feed_dict={filename_placeholder:'image.jpeg'})
# process the inference
output = sess.run(detections, feed_dict={image_placeholder:blob})

# get the shape of the image_features
print(output['image_features'].shape)

警告：我没有针对我所做的更改运行 tensorflow 单元测试，因此仅出于演示目的考虑它们，并且应该进行更多测试以确保它们不会破坏对象检测 api 中的其他内容。

score 4 · Accepted Answer

在最近的 PR 中添加了对特征提取的支持：（https://github.com/tensorflow/models/pull/7208）。要使用此功能，您可以使用导出器工具重新导出预训练模型。

作为参考，这是我使用的脚本：

#!/bin/bash
# NOTE: run this from tf/models/research directory

# Ensure that the necessary modules are on the PYTHONPATH
PYTHONPATH=".:./slim:$PYTHONPATH"

# Modify this to ensure that Tensorflow is accessible to your environment
conda activate tf37

# pick a model from the model zoo
ORIG_MODEL="faster_rcnn_inception_resnet_v2_atrous_oid_v4_2018_12_12"

# point at wherever you have downloaded the pretrained model
ORIG_MODEL_DIR="object_detection/pretrained/${ORIG_MODEL}"

# choose a destination where the updated model will be stored
DEST_DIR="${ORIG_MODEL_DIR}_with_feats"
echo "Re-exporting model from $ORIG_MODEL_DIR"

python3 object_detection/export_inference_graph.py \
     --input_type image_tensor \
     --pipeline_config_path "${ORIG_MODEL_DIR}/pipeline.config" \
     --trained_checkpoint_prefix "${ORIG_MODEL_DIR}/model.ckpt" \
     --output_directory "${DEST_DIR}"

要使用重新导出的模型，您可以更新run_inference_for_single_image示例笔记本中的以包含detection_features为输出：

def run_inference_for_single_image(image, graph):
    with graph.as_default():
        with tf.Session() as sess:
            # Get handles to input and output tensors
            ops = tf.get_default_graph().get_operations()
            all_tensor_names = {output.name for op in ops for output in op.outputs}
            tensor_dict = {}
            for key in ['num_detections', 'detection_boxes', 'detection_scores', 'detection_classes',
                        'detection_masks', 'detection_features']:
                tensor_name = key + ':0'
                if tensor_name in all_tensor_names:
                    tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name)
            if 'detection_masks' in tensor_dict:
                # The following processing is only for single image
                detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
                detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
                # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
                real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
                detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
                detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
                detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[1], image.shape[2])
                detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8)
                # Follow the convention by adding back the batch dimension
                tensor_dict['detection_masks'] = tf.expand_dims( detection_masks_reframed, 0)
            image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

            # Run inference
            output_dict = sess.run(tensor_dict, feed_dict={image_tensor: image})

            # all outputs are float32 numpy arrays, so convert types as appropriate
            output_dict['num_detections'] = int(output_dict['num_detections'][0])
            output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.int64)
            output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
            output_dict['detection_scores'] = output_dict['detection_scores'][0]
            output_dict['detection_features'] = output_dict['detection_features'][0]
            if 'detection_masks' in output_dict:
                output_dict['detection_masks'] = output_dict['detection_masks'][0]
    return output_dict

score 3 · Accepted Answer

诚然，这不是一个完美的答案，但我已经使用 TF-OD API 对 Faster-RCNN 进行了大量研究，并在这个问题上取得了一些进展。我将解释我通过深入研究 Faster-RCNN 版本所了解的内容，并希望您可以将其转换为 SSD。最好的办法是挖掘 TensorBoard 上的图表并筛选检测图中的张量名称。

首先，特征和框/分数之间并不总是存在简单的一对一对应关系。也就是说，您无法从网络中提取一个简单的张量来提供此功能，至少默认情况下不会。

下面是从 Faster-RCNN 网络获取特征的代码：

https://gist.github.com/markdtw/02ece6b90e75832bd44787c03a664e8d

尽管这提供了一些看起来像特征向量的东西，但您可以看到还有一些其他人在使用此解决方案时遇到了麻烦。基本问题是特征向量在 SecondStagePostprocessor 之前被拉取，后者在detection_boxes创建张量和类似的张量之前执行了几个操作。

在 SecondStagePostprocessor 之前，创建了类分数和框，并且留下了特征向量，再也看不到了。在后处理器中，有一个多类 NMS 阶段和一个排序阶段。最终结果是 MaxProposalsFromSecondStage，而特征向量是为 [MaxProposalsFromFirstStage, NumberOfFeatureVectors] 填充的。因此，有一个抽取和排序操作使得最终输出与特征向量索引很难配对。

我目前的解决方案是从第二阶段之前提取特征向量和框，然后手动完成其余的工作。毫无疑问，有比这更好的解决方案，但很难遵循图表并为排序操作找到合适的张量。

我希望这可以帮助你！抱歉，我无法为您提供端到端的解决方案，但我希望这能让您克服当前的障碍。

tensorflow - tf object detection api - 为每个检测 bbox 提取特征向量

3 回答 3

Related

Reference