microsoft-cognitive - 为什么表单识别器返回的“boundingBox”坐标对于我的 PDF 表单似乎不正确？

Question

我正在使用表单识别器从示例 PDF 表单（ACORD 3101 (2012/02)）中提取文本信息，但在某些字段上返回的“boundingBox”对我来说似乎不正确。我想知道它背后的原因是什么。

请指导如何计算这些boundingBox信息。谢谢。
请查看返回的 json 响应。“键值”对也不如预期。有一个“令牌”键带有多个值。但实际上，这些值中的每一个都应该在它们自己的键下。为什么？

我通过使用多张填写的表格和一张空的表格来培训服务。当我在训练模型上调用“/{id}/keys”时，我确实看到了被识别的键：

{
  "clusters": {
    "0": ["ADDITIONAL REMARKS", "ADDITIONAL REMARKS SCHEDULE", "Effective Date:", "Form Number:", "Form Title:", "Insured", "Insurer", "Intermediary", "Page", "Policy Number", "This Additional Remarks form is a schedule to ACORD form,", "__Tokens__"]
  }
}

我觉得很好。然后我调用“/{id}/analyze”API 来提取样本 PDF。正如我所说，结果似乎不正确。以下是我得到的 Json 响应的一部分。

{
    "status": "success",
    "pages": [
        {
            "number": 1,
            "height": 842,
            "width": 595,
            "clusterId": 0,
            "keyValuePairs": [
                {
                    "key": [
                        {
                            "text": "Page",
                            "boundingBox": [
                                493.2,
                                811.6,
                                514.7,
                                811.6,
                                514.7,
                                801.6,
                                493.2,
                                801.6
                            ]
                        }
                    ],
                    "value": [
                        {
                            "text": "of",
                            "boundingBox": [
                                543.6,
                                811.6,
                                552.1,
                                811.6,
                                552.1,
                                801.6,
                                543.6,
                                801.6
                            ],
                            "confidence": 1.0
                        }
                    ]
                },
                {
                    "key": [
                        {
                            "text": "__Tokens__",
                            "boundingBox": [
                                0.0,
                                0.0,
                                0.0,
                                0.0,
                                0.0,
                                0.0,
                                0.0,
                                0.0
                            ]
                        }
                    ],
                    "value": [
                        {
                            "text": "1",
                            "boundingBox": [
                                62.3,
                                97.3,
                                62.8,
                                97.3,
                                62.8,
                                96.2,
                                62.3,
                                96.2
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "1",
                            "boundingBox": [
                                66.6,
                                97.3,
                                67.1,
                                97.3,
                                67.1,
                                96.2,
                                66.6,
                                96.2
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "John Doe",
                            "boundingBox": [
                                2.8,
                                93.9,
                                6.9,
                                93.9,
                                6.9,
                                92.8,
                                2.8,
                                92.8
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "Taren Liu",
                            "boundingBox": [
                                36.4,
                                93.8,
                                40.4,
                                93.8,
                                40.4,
                                92.8,
                                36.4,
                                92.8
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "23456R02",
                            "boundingBox": [
                                2.8,
                                90.8,
                                7.2,
                                90.8,
                                7.2,
                                89.8,
                                2.8,
                                89.8
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "RBA",
                            "boundingBox": [
                                2.8,
                                87.9,
                                4.7,
                                87.9,
                                4.7,
                                86.9,
                                2.8,
                                86.9
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "11/08/2019",
                            "boundingBox": [
                                48.2,
                                87.9,
                                53.0,
                                87.9,
                                53.0,
                                86.9,
                                48.2,
                                86.9
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "140001",
                            "boundingBox": [
                                10.4,
                                83.3,
                                13.6,
                                83.3,
                                13.6,
                                82.2,
                                10.4,
                                82.2
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "Hello World",
                            "boundingBox": [
                                22.6,
                                83.3,
                                27.5,
                                83.3,
                                27.5,
                                82.2,
                                22.6,
                                82.2
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "This is the second fake form. See",
                            "boundingBox": [
                                2.8,
                                80.9,
                                17.0,
                                80.9,
                                17.0,
                                79.8,
                                2.8,
                                79.8
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "if",
                            "boundingBox": [
                                17.3,
                                80.9,
                                17.8,
                                80.9,
                                17.8,
                                79.8,
                                17.3,
                                79.8
                            ],
                            "confidence": 0.24
                        },
                        {
                            "text": "the form recognizer can learn from this.",
                            "boundingBox": [
                                18.0,
                                80.9,
                                34.7,
                                80.9,
                                34.7,
                                79.8,
                                18.0,
                                79.8
                            ],
                            "confidence": 0.24
                        }
                    ]
                }
            ],
            "tables": []
        }
    ],
    "errors": []
}

请注意高度和宽度值（分别为 842 和 595）是正确的。这些是以点为单位的正常 A4 纸尺寸。但是，“John Doe”和“aren Liu”的字段具有不正确的 boundingBox 信息。显然，这些边界框聚集在论文的左下角（例如，对于“John Doe”，它们是 2.8、93.9、6.9、93.9、6.9、92.8、2.8、92.8），而不是 pdf 顶部的预期位置. 为什么？

这是用于训练和分析的示例 pdf

这是另一个用于培训的示例 pdf

这是用于培训的空pdf

score 1 · Accepted Answer

边界框 - 8 个数字代表边界框角的 4 对 (x,y) 坐标，顺序如下：左上角、右上角、右下角、左下角。坐标系的原点是页面的左下角。
键“ Token ”包括表单识别器与键值对或表不匹配的所有文本。

您能否也分享一下没有任何真实数据的匿名训练数据？

score 0 · Accepted Answer

0

您是否验证了这些边界框是否正确？

于 2019-08-26T08:34:41.963 回答

microsoft-cognitive - 为什么表单识别器返回的“boundingBox”坐标对于我的 PDF 表单似乎不正确？

2 回答 2

Related

Reference