python-3.x - 从 azure blob 存储将数据加载到 delta Lake

Question

我正在尝试将数据从 azure blob 存储加载到 delta Lake 中。我正在使用下面的代码片段

storage_account_name = "xxxxxxxxdev" storage_account_access_key = "xxxxxxxxxxxxxxxxxxxxx"

file_location = "wasbs://bicc-hdspk-eus-qc@xxxxxxxxdev.blob.core.windows.net/FSHC/DIM/FSHC_DIM_SBU"

文件类型 = "csv"

spark.conf.set("fs.azure.account.key."+storage_account_name+".blob.core.windows.net",storage_account_access_key)

df = spark.read.format(file_type).option("header","true").option("inferSchema", "true").option("delimiter", '|').load(file_location)

dx = df.write.format("镶木地板")

直到这一步它正在工作，我也可以将它加载到 databricks 表中。

dx.write.format("delta").save(file_location)

错误：AttributeError：“DataFrameWriter”对象没有属性“write”

ps - 我是否将文件位置错误地传递到写入语句中？如果这是原因，那么 delta Lake 的文件路径是什么。

如果需要更多信息，请回复我。

谢谢，阿比鲁普

score 2 · Accepted Answer

dx 是一个数据帧编写器，所以你试图做的事情没有意义。你可以这样做：

df = spark.read.format(file_type).option("header","true").option("inferSchema", "true").option("delimiter", '|').load(file_location)

df.write.format("parquet").save()
df.write.format("delta").save()

python-3.x - 从 azure blob 存储将数据加载到 delta Lake

直到这一步它正在工作，我也可以将它加载到 databricks 表中。

1 回答 1

Related

Reference