我能够创建一个小型粘合作业,将数据从一个 S3 存储桶提取到另一个存储桶中,但不清楚代码中的最后几行(如下)。
applymapping1 = ApplyMapping.apply(frame = datasource_lk, mappings = [("row_id", "bigint", "row_id", "bigint"), ("Quantity", "long", "Quantity", "long"),("Category", "string", "Category", "string") ], transformation_ctx = "applymapping1")
selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["row_id", "Quantity", "Category"], transformation_ctx = "selectfields2")
resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "mydb", table_name = "order_summary_csv", transformation_ctx = "resolvechoice3")
datasink4 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice3, database = "mydb", table_name = "order_summary_csv", transformation_ctx = "datasink4")
job.commit()
- 从上面的代码片段中,'ResolveChoice' 有什么用?是强制性的吗?
- 当我运行此作业时,它在目标(order_summary.csv)中创建了一个新文件夹和文件(带有一些随机文件名)并摄取数据而不是直接摄取到驻留在 S3 文件夹中的 order_summary_csv 表(一个 CSV 文件)中. spark(Glue) 是否可以将数据摄取到所需的 CSV 文件中?