2

我试图让我们 amazon textract 解析 PDF,这需要是他们API中概述的异步操作。

我使用的代码遵循我在github上找到的代码

我已将错误本地化以了解错误是由于作为请求的一部分提交的通知处理程序引起的 - 但是稍后在获取结果时需要这样做。

我应该注意,我已经尝试了aws docs显示的手动配置并直接使用了 ARN,但我遇到了同样的问题。

我的 s3 存储桶位于 VPC 中,但是我确信我的 IAM 设置正确,因为我已设法从 png 文件中提取文本(该文件使用不同的非异步方法)。这些是添加的策略。

  • AmazonSQSFullAccess
  • AmazonS3FullAccess
  • AmazonTexttractFullAccess
  • AmazonTexttractServiceRole
  • AmazonSNSRole
  • AmazonSNSFullAccess

这是使用的两种方法,创建主题和队列,然后是 StartDocumentTextDetection。

// Creates an SNS topic and SQS queue. The queue is subscribed to the topic. 
static void CreateTopicandQueue(){
        //create a new SNS topic
        snsTopicName="AmazonTextractTopic" + Long.toString(System.currentTimeMillis());
        CreateTopicRequest createTopicRequest = new CreateTopicRequest(snsTopicName);
        CreateTopicResult createTopicResult = sns.createTopic(createTopicRequest);
        snsTopicArn=createTopicResult.getTopicArn();

        //Create a new SQS Queue
        sqsQueueName="AmazonTextractQueue" + Long.toString(System.currentTimeMillis());
        final CreateQueueRequest createQueueRequest = new CreateQueueRequest(sqsQueueName);
        sqsQueueUrl = sqs.createQueue(createQueueRequest).getQueueUrl();
        sqsQueueArn = sqs.getQueueAttributes(sqsQueueUrl, Arrays.asList("QueueArn")).getAttributes().get("QueueArn");

        //Subscribe SQS queue to SNS topic
        String sqsSubscriptionArn = sns.subscribe(snsTopicArn, "sqs", sqsQueueArn).getSubscriptionArn();

        // Authorize queue
        Policy policy = new Policy().withStatements(
                new Statement(Effect.Allow)
                        .withPrincipals(Principal.AllUsers)
                        .withActions(SQSActions.SendMessage)
                        .withResources(new Resource(sqsQueueArn))
                        .withConditions(new Condition().withType("ArnEquals").withConditionKey("aws:SourceArn").withValues(snsTopicArn))
        );


        Map queueAttributes = new HashMap();
        queueAttributes.put(QueueAttributeName.Policy.toString(), policy.toJson());
        sqs.setQueueAttributes(new SetQueueAttributesRequest(sqsQueueUrl, queueAttributes));


        System.out.println("Topic arn: " + snsTopicArn);
        System.out.println("Queue arn: " + sqsQueueArn);
        System.out.println("Queue url: " + sqsQueueUrl);
        System.out.println("Queue sub arn: " + sqsSubscriptionArn );
    }
private static void StartDocumentTextDetection(String bucket, String document) throws Exception{

        //Create notification channel
        NotificationChannel channel = new NotificationChannel()
                .withRoleArn(roleArn)
                .withSNSTopicArn(snsTopicArn);


        S3Object s3ObjectTextract = new S3Object();
        s3ObjectTextract.setBucket(bucket);
        s3ObjectTextract.setName(document);

        StartDocumentTextDetectionRequest req = new StartDocumentTextDetectionRequest()
                .withDocumentLocation(new DocumentLocation()
                        .withS3Object(s3ObjectTextract))
                .withJobTag("DetectingText")
                .withNotificationChannel(channel);

        System.out.println("Found the document: " + document);
        System.out.println(req.toString());

        StartDocumentTextDetectionResult startDocumentTextDetectionResult = textract.startDocumentTextDetection(req);

        startJobId=startDocumentTextDetectionResult.getJobId();
    }

这是我的错误堆栈跟踪:

{DocumentLocation: {S3Object: {Bucket: <bucketname>,Name: <filename>,}},JobTag: DetectingText,NotificationChannel: {SNSTopicArn: <Arn>,RoleArn: <arn}}
Exception in thread "main" com.amazonaws.services.textract.model.InvalidParameterException: Request has invalid parameters (Service: AmazonTextract; Status Code: 400; Error Code: InvalidParameterException; Request ID: <>)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
    at com.amazonaws.services.textract.AmazonTextractClient.doInvoke(AmazonTextractClient.java:816)
    at com.amazonaws.services.textract.AmazonTextractClient.invoke(AmazonTextractClient.java:783)
    at com.amazonaws.services.textract.AmazonTextractClient.invoke(AmazonTextractClient.java:772)
    at com.amazonaws.services.textract.AmazonTextractClient.executeStartDocumentTextDetection(AmazonTextractClient.java:738)
    at com.amazonaws.services.textract.AmazonTextractClient.startDocumentTextDetection(AmazonTextractClient.java:708)
    at AnalyzeDocumentAsync.StartDocumentTextDetection(AnalyzeDocumentAsync.java:279)
    at AnalyzeDocumentAsync.ProcessDocument(AnalyzeDocumentAsync.java:158)
    at AnalyzeDocumentAsync.main(AnalyzeDocumentAsync.java:91)

Process finished with exit code 1
4

0 回答 0