tensorflow - 在使用 Mobilenet_V1_0.25_224_quant 模型的 Tensorflow Lite Micro 解释器->Invoke() 调用中检测到损坏的堆栈

Question

我正在尝试将量化模型与 Tensorflow Lite Micro 一起使用，并在interpreter->Invoke() 调用中出现分段错误。

调试器显示从CONV_2D的节点 28 上的 conv.cc中的 Eval() 返回时发生分段错误，并且堆栈已损坏。错误消息*** stack smashing detected ***: <unknown> terminated带有编译器标志“-fstack-protector-all -Wstack-protector”。

我的测试简单地来自于人员检测示例，在 Tensorflow lite 预训练模型站点上将模型替换为 Mobilenet_V1_0.25_224_quant ，增加了足够的 kTensorArenaSize，模型输入/输出大小更改为 224x224x3 和 1x1001，并拉动了额外的必需运算符。

还尝试了几种不同的模型，在另一种量化模式 Mobilenet_V1_0.25_192_quant 显示相同的段错误问题，但常规浮点模式 Mobilenet_V1_0.25_192 和 Mobilenet_V1_0.25_224 运行良好，有很多循环。

有没有人见过类似的问题？还是我应该注意对 Tensorflow Lite Micro 的一些限制？

这个问题可以在这个forked tensorflow repo 的提交中重现。

构建命令：

$ bazel build //tensorflow/lite/micro/examples/person_detection:person_detection       -c dbg --copt=-fstack-protector-all --copt=-Wstack-protector --copt=-fno-omit-frame-pointer

并运行：

$ ./bazel-bin/tensorflow/lite/micro/examples/person_detection/person_detection

文件更改：

tensorflow/lite/micro/examples/person_detection/main_functions.cc
tensorflow/lite/micro/examples/person_detection/model_settings.h 
tensorflow/lite/micro/examples/person_detection/person_detect_model_data.cc

main_functions.cc的变化：

constexpr int kTensorArenaSize = 1400 * 1024;
static tflite::MicroOpResolver<5> micro_op_resolver;
micro_op_resolver.AddBuiltin(tflite::BuiltinOperator_RESHAPE,
                             tflite::ops::micro::Register_RESHAPE());
micro_op_resolver.AddBuiltin(tflite::BuiltinOperator_SOFTMAX,
                             tflite::ops::micro::Register_SOFTMAX(), 1, 2);

model_settings.h中的更改

constexpr int kNumCols = 224;
constexpr int kNumRows = 224;
constexpr int kNumChannels = 3;
constexpr int kCategoryCount = 1001;

最后一个模型数据文件person_detect_model_data.cc相当大，请在 github 上查看完整文件。

2020 年 3 月 28 日：也在 Raspberry Pi 3 上进行了测试，结果与 x86 Ubuntu 18.04 上的结果相同。

pi@raspberrypi:~/tests $ ./person_detection 
*** stack smashing detected ***: <unknown> terminated
Aborted

谢谢你的帮助。

找到问题根本原因 - 2020 年 4 月 2 日更新：

我发现问题是由于图层操作数据的数组溢出引起的。Tensorflow microlite 在输出通道上有一个隐藏限制（或者我错过了文档，至少 TF microlite 运行时不检查），在TF micro lite的conv.cc的 OpData 结构中最大为256 。

constexpr int kMaxChannels = 256;
....
struct OpData {
...
  // Per channel output multiplier and shift.
  // TODO(b/141139247): Allocate these dynamically when possible.
  int32_t per_channel_output_multiplier[kMaxChannels];
  int32_t per_channel_output_shift[kMaxChannels];
...
}

mobilenet模型Mobilenet_V1_0.25_224_quant.tflite有1000个输出类，内部共有1001个通道。对于最后一个输出大小为 1001 的 Conv2D，它会导致 tensorflow/lite/kernels/kernel_util.cc:90 的 tflite::PopulateConvolutionQuantizationParams() 中的堆栈损坏。

TF 和 TF lite 没有问题，因为它们被认为不使用这种结构定义。

确认在模型评估调用循环上将通道增加到 1024。

尽管大多数 TF microlite 案例可能是小型模型，并且可能不会遇到这个问题。

这个限制可能有更好的记录和/或在运行时执行检查？

tensorflow - 在使用 Mobilenet_V1_0.25_224_quant 模型的 Tensorflow Lite Micro 解释器->Invoke() 调用中检测到损坏的堆栈

0 回答 0

Related

Reference