python - Lambda 和 Textract：start_document_text_detection 未知参数“OutputConfig”

Question

我正在尝试使用 Lambda 函数和 Textract 从 PDF 中提取文本。

我的问题是： 如何调用“start_document_text_detection”以便 Textract 自动将其响应发送到 S3？

我收到一条错误消息：

[错误] ParamValidationError：参数验证失败：输入中的未知参数：“OutputConfig”，必须是以下之一：DocumentLocation、ClientRequestToken、JobTag、NotificationChannel

我的代码：

    textract = boto3.client('textract')
    textract.start_document_text_detection(
      DocumentLocation={
          'S3Object': {
              'Bucket': origin_bucket,
              'Name': key
          }
      },
      JobTag=key + '_Job',
      OutputConfig={
        "S3Bucket": destination_bucket,
        "S3Prefix": key
      })

Boto3 文档显示我可以传递一个名为“OutputConfig”的参数： https ://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/textract.html#Textract.Client.start_document_text_detection

response = client.start_document_text_detection(
    DocumentLocation={
        'S3Object': {
            'Bucket': 'string',
            'Name': 'string',
            'Version': 'string'
        }
    },
    ClientRequestToken='string',
    JobTag='string',
    NotificationChannel={
        'SNSTopicArn': 'string',
        'RoleArn': 'string'
    },
    OutputConfig={
        'S3Bucket': 'string',
        'S3Prefix': 'string'
    }
)

AWS Textract 的官方文档也说它是一个有效参数：

另一个可用的可选参数是OutputConfig，它允许您调整输出的放置位置。默认情况下，Amazon Textract 将在内部存储结果，并且只能通过 Get API 操作访问。启用 OutputConfig 后，您可以设置将输出发送到的存储桶的名称及其文件前缀，您可以在其中以 JSON 格式下载结果。这允许使用用户创建的存储桶来存储结果。

https://docs.aws.amazon.com/textract/latest/dg/api-async.html

score 0 · Accepted Answer

您似乎使用的是较旧版本的 boto3。升级到最新版本 (>1.16.36) 应该可以解决您的问题。

python - Lambda 和 Textract：start_document_text_detection 未知参数“OutputConfig”

1 回答 1

Related

Reference