python - Lambda 在完成后自动删除转录作业

Question

我正在寻找编辑我的 lambda，以便在其作业状态为“完成”时删除转录作业。我有以下代码：

 import json
    import time
    import boto3
    from urllib.request import urlopen

    def lambda_handler(event, context):
        transcribe = boto3.client("transcribe")
        s3 = boto3.client("s3")

        if event:
            file_obj = event["Records"][0]
            bucket_name = str(file_obj["s3"]["bucket"]["name"])
            file_name = str(file_obj["s3"]["object"]["key"])
            s3_uri = create_uri(bucket_name, file_name)
            file_type = file_name.split("2019.")[1]
            job_name = file_name
            transcribe.start_transcription_job(TranscriptionJobName=job_name,
                                                Media ={"MediaFileUri": s3_uri},
                                                MediaFormat = file_type,
                                                LanguageCode = "en-US",
                                                Settings={
                                                    "VocabularyName": "Custom_Vocabulary_by_Brand_Other_Brands",
                                                    "ShowSpeakerLabels": True,
                                                    "MaxSpeakerLabels": 4
                                                })


            while True:
                status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
                if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["FAILED"]:
                    break
                print("It's in progress")
            while True:
                status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
                if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED"]:
                    transcribe.delete_transcription_job(TranscriptionJobName=job_name
                )

                time.sleep(5)

            load_url = urlopen(status["TranscriptionJob"]["Transcript"]["TranscriptFileUri"])
            load_json = json.dumps(json.load(load_url))

            s3.put_object(Bucket = bucket_name, Key = "transcribeFile/{}.json".format(job_name), Body=load_json)


        # TODO implement
        return {
            'statusCode': 200,
            'body': json.dumps('Hello from Lambda!')
        }

    def create_uri(bucket_name, file_name):
        return "s3://"+bucket_name+"/"+file_name

处理这项工作的部分是：

 while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["FAILED"]:
            break
        print("It's in progress")
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED"]:
            transcribe.delete_transcription_job(TranscriptionJobName=job_name
        )

如果作业正在进行中，它会说“正在进行中”，但当它显示“已完成”时，它会删除。

任何想法为什么我当前的代码不起作用？它会完成转录作业，但不会将其删除。

score 2 · Accepted Answer

如果可以避免，则不应轮询信息，尤其是在 Lambda 中。

响应转录作业状态变化的正确方法是使用 CloudWatch Events。例如，您可以配置规则以在转录作业成功完成时将事件路由到 AWS Lambda 函数。

当您的 Lambda 函数因转录作业中的状态更改而被调用时，Lambda 函数将接收event数据，例如：

{
    "version": "0",
    "id": "1a234567-1a6d-3ab4-1234-abf8b19be1234",
    "detail-type": "Transcribe Job State Change",
    "source": "aws.transcribe",
    "account": "123456789012",
    "time": "2019-11-19T10:00:05Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "TranscriptionJobName": "my-transcribe-test",
        "TranscriptionJobStatus": "COMPLETED"
    }
}

使用将TranscriptionJobName状态更改关联回原始作业。

score 1 · Accepted Answer

对不起，伙计们，我又看了一眼，犯了一个非常非常愚蠢的错误。我有transcribe.delete_transcription_job(TranscriptionJobName=job_name完全不正确的部分。

请在下面找到正确且有效的代码：

import json
import time
import boto3
from urllib.request import urlopen

def lambda_handler(event, context):
    transcribe = boto3.client("transcribe")
    s3 = boto3.client("s3")

    if event:
        file_obj = event["Records"][0]
        bucket_name = str(file_obj["s3"]["bucket"]["name"])
        file_name = str(file_obj["s3"]["object"]["key"])
       s3_uri = create_uri(bucket_name, file_name)
        file_type = file_name.split("2019.")[1]
        job_name = file_name
        transcribe.start_transcription_job(TranscriptionJobName=job_name,
                                            Media ={"MediaFileUri": s3_uri},
                                            MediaFormat = file_type,
                                            LanguageCode = "en-US",
                                            Settings={
                                                "VocabularyName": "Custom_Vocabulary_by_Brand_Other_Brands",
                                                "ShowSpeakerLabels": True,
                                                "MaxSpeakerLabels": 4
                                            })


        while True:
            status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
            if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED", "FAILED"]:
                transcribe.delete_transcription_job(TranscriptionJobName=job_name)
                break
            print("It's in progress")

            time.sleep(5)

        load_url = urlopen(status["TranscriptionJob"]["Transcript"]["TranscriptFileUri"])
        load_json = json.dumps(json.load(load_url))

        s3.put_object(Bucket = bucket_name, Key = "transcribeFile/{}.json".format(job_name), Body=load_json)


    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

def create_uri(bucket_name, file_name):
    return "s3://"+bucket_name+"/"+file_name

python - Lambda 在完成后自动删除转录作业

2 回答 2

Related

Reference