javascript - 过滤掉 AWS Textract 函数返回的数据

Question

我提取了 Textract AWS 函数返回的数据。此 Textract 函数的返回数据类型为以下类型：

{
   "AnalyzeDocumentModelVersion": "string",
   "Blocks": [ 
      { 
         "BlockType": "string",
         "ColumnIndex": number,
         "ColumnSpan": number,
         "Confidence": number,
         "EntityTypes": [ "string" ],
         "Geometry": { 
            "BoundingBox": { 
               "Height": number,
               "Left": number,
               "Top": number,
               "Width": number
            },
            "Polygon": [ 
               { 
                  "X": number,
                  "Y": number
               }
            ]
         },
         "Id": "string",
         "Page": number,
         "Relationships": [ 
            { 
               "Ids": [ "string" ],
               "Type": "string"
            }
         ],
         "RowIndex": number,
         "RowSpan": number,
         "SelectionStatus": "string",
         "Text": "string"
      }
   ],
   "DocumentMetadata": { 
      "Pages": number
   },
   "JobStatus": "string",
   "NextToken": "string",
   "StatusMessage": "string",
   "Warnings": [ 
      { 
         "ErrorCode": "string",
         "Pages": [ number ]
      }
   ]
}

我通过以下代码从这些数据中提取了块：

var d = null;
...<Some Code Here>...
d = data.Blocks;
console.log(d);

它以 JSON 对象数组的形式提供输出。下面给出了提取文本的示例：

[...{ BlockType: 'WORD',
    Confidence: 99.7286376953125,
    Text: '2000.00',
    Geometry: { BoundingBox: [Object], Polygon: [Array] },
    Id: '<ID here>',
    Page: 1 }, ...]

我只想提取文本字段并将其视为唯一的输出。我该如何开始呢？

score 3 · Accepted Answer

我可能误解了您的问题，但如果您需要提取数据数组中每个对象的 Text 字段的值，请查看以下示例

const data = [
  {
    BlockType: "WORD",
    Confidence: 99.7286376953125,
    Text: "2000.00",
    Geometry: { BoundingBox: {}, Polygon: [] },
    Id: "<ID here>",
    Page: 1,
  },
];

const output = data.map(({ Text: text }) => text);

console.log(output);

看

javascript - 过滤掉 AWS Textract 函数返回的数据

1 回答 1

Related

Reference