python-3.x - 列车请求时从表单识别器 API 服务器返回的索引越界错误

Question

当我尝试用 5 个 pdf 训练模型时，我从 API 服务器收到 200 响应，并且所有文档都没有错误并且状态为成功，但响应本身的错误字段返回：{'errorMessage':'Unable提取键/值对。列表索引超出范围'}。看来 API 服务器上可能存在错误。

我已经成功地训练了一个模型并分析了作为样本发票提供的 pdf。而且，我的数据的训练 API 请求返回 200 结果。因此，服务器端似乎确实有问题。可能是由我发送的数据中的某些内容引起的？但是，很明显，我无法访问错误的跟踪。

########### Python Form Recognizer Train #############
from requests import post as http_post

# Endpoint URL
base_url = r"https://westus2.api.cognitive.microsoft.com/" + "/formrecognizer/v1.0-preview/custom"
source = r"https://formrecognizerblob1.blob.core.windows.net/$root/...
headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': <key>,
}
url = base_url + "/train" 
body = {"source": source}
try:
    resp = http_post(url = url, json = body, headers = headers)
    print("Response status code: %d" % resp.status_code)
    print("Response body: %s" % resp.json())
except Exception as e:
    print(str(e))

执行上面的代码（在我的 blob 存储的根容器中使用我的 5 个 pdf）返回：

响应状态码：200 响应正文：{'modelId': 'e6dd8978-dfcc-438b-b0b2-639c13327cdf', 'trainingDocuments': [{'documentName': '.pdf', 'pages': 5, 'errors': [], 'status': 'success'}, {'documentName': '.pdf', 'pages': 4, 'errors': [], 'status': 'success'}, {'documentName': ' .pdf', 'pages': 17, 'errors': [], 'status': 'success'}, {'documentName': '.pdf', 'pages': 7, 'errors': [], ' status': 'success'}, {'documentName': '.pdf', 'pages': 11, 'errors': [], 'status': 'success'}], 'errors': [{'errorMessage' : '无法提取键/值对。列表索引超出范围'

score 0 · Accepted Answer

它确实看起来像一个后端错误。如果你能分享你用于训练的数据，我可以做一些进一步的调查。

python-3.x - 列车请求时从表单识别器 API 服务器返回的索引越界错误

1 回答 1

Related

Reference