从文本文档:Documents for synchronous operations can be in PNG or JPEG format. Documents for asynchronous operations can also be in PDF format.
我有一个 Node.js 应用程序,我在其中使用异步 Textract 读取 PDF 文件。我的代码如下所示:
import * as AWS from 'aws-sdk';
const textract = new AWS.Textract({ region: '<REGION>' });
export const callTextract = (file: File, uuid: string): Promise<any> => {
return new Promise<any>((resolve, reject) => {
const params = {
Document: {
Bytes: file,
},
};
textract.detectDocumentText(params, (err, data) => {
....
resolve(data);
});
})
}
此处的文件已从操作系统中读取,为 Buffer 格式。由于前 4 个字节,我可以确认它是 PDF 文件(Detecting file type from buffer in node js?):
<Buffer 25 50 44 46 ... >
我收到的错误是UnsupportedDocumentException
.