我正在尝试通过松散地遵循本教程来重新训练用于自定义对象检测的 mobilenet_v2 模型。我的最终目标是拥有一个我可以查询的 web_model,它将提供分数、classId 和检测次数。最终导出的推理模型在 python 环境中工作,但目前在转换为 web 时会抛出奇怪的错误。
感觉在我的管道中某处缺少一个步骤,以使推理图能够转换为 web。这似乎是model_main.py
设置的问题is_training=True
,最终与最终的推理模型有关。我似乎找不到任何关于如何从我的训练模型生成非训练模型的支持文档或教程。
我一直在使用 tensorflow-gpu 1.13.1 并重新训练object detection zoomodel_main.py
提供的当前 ssd_mobilenet_v2_coco 模型。我也尝试过使用 legacy和 tensorflow 1.14.0。train.py
当需要将其转换为 tfjs 时,我同时使用了 tensorflowjs 1.2.2.1 和 0.8.6,在尝试在 web 上运行最终结果时都会导致相同的错误。
在使用 0.8.6 进行转换之前,我还尝试在冻结模型上执行中间图转换。
训练模型:
python model_main.py --model_dir=output --pipeline_config_path=training\ssd_mobilenet_v2_coco.config -num_train_steps=200000
导出推理图:
python export_inference_graph.py --input_type=image_tensor --output_directory=output_inf --pipeline_config_path=training\ssd_mobilenet_v2_coco.config --trained_checkpoint_prefix=neg_32\model.ckpt-XXXX
使用 tfjs 1.2.2.1 进行转换:
tensorflowjs_converter --input_format=tf_saved_model --output_format=tfjs_graph_model --saved_model_tags=serve --signature_name=serving_default output_inf\saved_model output_inf\web_model
在浏览器中测试模型:
import * as tf from '@tensorflow/tfjs';
class Detector {
async init() {
try {
this.model = await tf.loadGraphModel('/web_model/model.json');
} catch (err) {
console.log(err);
}
}
async detect(frame) {
const { model } = this;
const INPUT_TENSOR='image_tensor';
const OUTPUT_TENSOR='num_detections'
const zeros = tf.zeros([1, 300, 300, 3]);
console.log("executing model");
output = await model.executeAsync({[INPUT_TENSOR]: zeros}, OUTPUT_TENSOR);
console.log(output);
}
}
export default Detector;
中间变换:
def get_graph_def_from_file(graph_filepath):
with ops.Graph().as_default():
with tf.gfile.GFile(graph_filepath, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
graph_def = get_graph_def_from_file(file_name)
input_node=['image_tensor']
output_node=['num_detections,detection_scores,detection_boxes,detection_classes']
transforms = [
'remove_nodes(op=Identity, op=CheckNumerics)',
'fold_constants(ignore_errors=true)',
'fold_batch_norms',
'fold_old_batch_norms(ignore_errors=true)',
'merge_duplicate_nodes',
'strip_unused_nodes'
]
transformed_graph_def = graph_util.remove_training_nodes(graph_def, protected_nodes=output_node)
transformed_graph_def = TransformGraph(
graph_def,
input_node,
output_node,
transforms)
tf.train.write_graph(transformed_graph_def,
logdir=model_dir,
as_text=False,
name=out_name)
我希望最终的网络模型能够提供来自测试阵列的检测结果。但是,当执行 javascript 代码时,tensorflowjs 会返回以下错误:
Uncaught (in promise) Error: Operands could not be broadcast together with shapes 1,150,150,32 and 0.
at Ir (tfjs:2)
at new bi (tfjs:2)
at e.batchNormalization (tfjs:2)
at kt.runKernel.$x (tfjs:2)
at tfjs:2
at t.scopedRun (tfjs:2)
at t.runKernel (tfjs:2)
at os (tfjs:2)
at batchNorm (tfjs:2)
at jv (tfjs:2)
然后尝试在 TransformGraph 中应用 fold_old_batch_norms 会产生此错误:
2019-07-07 22:16:11.717749: I tensorflow/tools/graph_transforms/transform_graph.cc:317] Applying fold_old_batch_norms
Traceback (most recent call last):
File "xxx/optimize.py", line 154, in <module>
optimize_graph(model_dir, output_frozen_fname, transforms, output_nodes, output_optimized_fname)
File "xxx/optimize.py", line 135, in optimize_graph
transforms)
File "xxx\venv\lib\site-packages\tensorflow\tools\graph_transforms\__init__.py", line 51, in TransformGraph
transforms_string, status)
File "xxx\venv\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Beta input to batch norm has bad shape: [32]