我有一排不同语言的句子,语言代码在单独的列中。我指定只处理某些语言(en、es、fr 或 de),因为我知道 AWS Comprehend 不支持“nl”(荷兰语)。出于某种原因,我继续收到不支持“nl”的错误,即使它没有在我的 when 条件中列出,因此不应该通过 Comprehend udf 发送。关于什么可能是错的任何想法?
这是我的代码:
import pyspark.sql.functions as F
def detect_sentiment(text,language):
comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
sentiment_analysis = comprehend.detect_sentiment(Text=text, LanguageCode=language)
return sentiment_analysis
detect_sentiment_udf = F.udf(detect_sentiment)
reviews_4 = reviews_3.withColumn('RAW_SENTIMENT_SCORE', \
F.when( (F.col('LANGUAGE')=='en') | (F.col('LANGUAGE')=='es') | (F.col('LANGUAGE')=='fr') | (F.col('LANGUAGE')=='de') , \
detect_sentiment_udf('SENTENCE', 'LANGUAGE')).otherwise(None) )
reviews_4.show(50)
我收到此错误:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DetectSentiment operation:
Value 'nl' at 'languageCode'failed to satisfy constraint: Member must satisfy enum value set: [ar, hi, ko, zh-TW, ja, zh, de, pt, en, it, fr, es]