python - 如何在 protobuf 中调试无效的 utf-8？

翻译自：https://stackoverflow.com/questions/65564975 2021-01-04T15:07:57.080

1342 次

我正在使用一些 tensorflow 代码并尝试加载经过训练的检查点，但由于 protobuf 错误而失败，如下所示：

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'tensorflow.TensorShapeProto.Dim.name' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. 
Traceback (most recent call last):
  [...]
  File "/home/sopi/miniconda3/envs/magenta2/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3053, in _as_graph_def
    graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message

为了调试显然产生无效 utf-8 的训练代码，我想知道有问题的无效数据实际上是什么样子。单步执行 pdb 中的代码并没有让我走得太远，因为ParseFromString()它是用 C++ 实现的。

如何找出无效的 utf-8 数据是什么？甚至是字节数组中发生错误的位置？

（在这种情况下，graph是 a tensorflow.core.framework.graph_pb2.GraphDef，它是 . 的子类，google.protobuf.message.Message但我的问题一般涉及 protobuf 解析，我认为在这方面没有什么特别之处GraphDef）

python - 如何在 protobuf 中调试无效的 utf-8？

0 回答 0

Related

Reference