amazon-sagemaker - Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

Question

I have modified this sample to read PDFs in tabular format. I would like to keep the tabular structure of the original pdf when doing the human review process. I notice the custom worker task template uses the crowd-entity-annotation element which seems to read only texts. I am aware that the human reviewer process reads from an S3 key which contains raw text written by the textract process.

I have been considering writing to S3 using tabulate but I don't think that is the best solution. I would like to keep the structure and still have the ability to annotate custom entities.

score 1 · Accepted Answer

Comprehend 现在原生支持检测 pdf 文档的自定义实体。为此，您可以尝试以下步骤：

按照这个github 自述文件开始 PDF 文档的注释过程。
一旦产生注释。您可以使用 Comprehend CreateEntityRecognizer API 为半结构化文档训练自定义实体模型”</li>
训练实体识别器后，您可以使用 StartEntitiesDetectionJob API 对 PDF 文档运行推理

amazon-sagemaker - Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

1 回答 1

Related

Reference