python-3.x - 让 aws lambda 开始使用不带扩展名的文件名转录作业

Question

我有一个 lambda 函数，它将在将对象放入 s3 存储桶时开始转录作业。我无法将转录作业设置为不带扩展名的文件名；由于某种原因，该文件也没有放入 S3 存储桶中的正确前缀文件夹中，这就是我所拥有的：

import json
import boto3
import time
import os
from urllib.request import urlopen

transcribe = boto3.client('transcribe')

def lambda_handler(event, context):
    if event:
        file_obj = event["Records"][0]
        bucket_name = str(file_obj['s3']['bucket']['name'])
        file_name = str(file_obj['s3']['object']['key'])
        s3_uri = create_uri(bucket_name, file_name)
        job_name = filename

        print(os.path.splitext(file_name)[0])

        transcribe.start_transcription_job(TranscriptionJobName = job_name,
                                           Media = {'MediaFileUri': s3_uri},
                                           MediaFormat =  'mp3',
                                           LanguageCode = "en-US",
                                           OutputBucketName = "sbox-digirepo-transcribe-us-east-1",
                                           Settings={
                                            # 'VocabularyName': 'string',
                                            'ShowSpeakerLabels': True,
                                            'MaxSpeakerLabels': 2,
                                            'ChannelIdentification': False
                                        })
        while Ture:
            status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
            if status["TranscriptionJob"]["TranscriptionJobStatus"] in ["COMPLETED", "FAILED"]:
                break
            print("Transcription in progress")
            time.sleep(5)

        s3.put_object(Bucket = bucket_name, Key="output/{}.json".format(job_name), Body=load_)
    return {
        'statusCode': 200,
        'body': json.dumps('Transcription job created!')
    }

def create_uri(bucket_name, file_name):
    return "s3://"+bucket_name+"/"+file_name

我得到的错误是

[ERROR] BadRequestException: An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: 1 validation error detected: Value 'input/7800533A.mp3' at 'transcriptionJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[0-9a-zA-Z._-]+

所以我想要的输出在这种情况下应该有TranscriptionJobName值为 7800533A，结果 OutputBucketName 在 s3bucket/output 中。任何帮助表示赞赏，在此先感谢。

score 0 · Accepted Answer

该TranscriptionJobName参数是您工作的友好名称，并且基于正则表达式非常有限。您将完整的对象键传递给它，其中包含前缀input/，但/在作业名称中是一个不允许的字符。您可以在代码中拆分文件名部分：

job_name = file_name.split('/')[-1]

我在GitHub 上放了一个上传媒体和启动 AWS Transcribe 作业的完整示例，将所有这些放在上下文中。

python-3.x - 让 aws lambda 开始使用不带扩展名的文件名转录作业

1 回答 1

Related

Reference