0

试图通过databricks流数据帧将数据写入SQl DW。进程正在尝试删除 BLOB 存储中的临时文件夹并在下面抛出。在文档中,我看到该过程不会自动清理 tempdir。这是真的吗?如果为真,那么为什么会出现此错误?在python中使用以下查询

df1.writeStream
.format("com.databricks.spark.sqldw")
.option("url", sqlDwUrlSmall)
.option("tempDir", tempDir)
.option("forwardSparkAzureStorageCredentials", "true")
.option("dbTable", "SampleTable")
.option("checkpointLocation", "/tmp_checkpoint_location1")
.option("numStreamingTempDirsToKeep", -1)
.start()

错误 AzureNativeFileSystemStore:在 Blob 上删除时遇到存储异常:https ://savupputest1.blob.core.windows.net/container1/tempDirs/2019-12-20/21-27-29-347/adca2ed6-a705-4274-8c24 -0f0e3d7c64a7/batch0,异常详细信息:不允许在非空目录上执行此操作。错误代码:DirectoryIsNotEmpty 19/12/20 21:27:32 错误 AzureNativeFileSystemStore:尝试删除密钥 tempDirs/2019-12-20/21-27-29-347/adca2ed6-a705-4274-8c24-0f0e3d7c64a7/batch0 时失败

4

1 回答 1

0

在开始之前,请确保“tempDir”是一个 wasbs URI。我们建议您为 SQL DW 使用专用的 Blob 存储容器。

这是结构化流的python示例。

# Set up the Blob storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.chepra.blob.core.windows.net",
  "gv7nVXXXXXXXXXXXXXXXXXXXXXXXXXldlOiA==")

# Prepare streaming source; this could be Kafka, Kinesis, or a simple rate stream.
df = spark.readStream \
  .format("rate") \
  .option("rowsPerSecond", "100000") \
  .option("numPartitions", "16") \
  .load()

# Apply some transformations to the data then use
# Structured Streaming API to continuously write the data to a table in SQL DW.

df.writeStream \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://chepra.database.windows.net:1433;database=chepradw;user=chepra@chepra;password=XXXXXXXXXXX;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;") \
  .option("tempDir", "wasbs://data-files@chepra.blob.core.windows.net/data-files") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "chepradw") \
  .option("checkpointLocation", "/tmp_checkpoint_location") \
  .start()

笔记本的输出:

在此处输入图像描述

有关更多详细信息,请参阅“<a href="https://docs.microsoft.com/en-ca/azure/databricks/data/data-sources/azure/sql-data-warehouse?toc=https%3A%2F %2Fdocs.microsoft.com%2Fen-ca%2Fazure%2Fazure-databricks%2FTOC.json&bc=https%3A%2F%2Fdocs.microsoft.com%2Fen-ca%2Fazure%2Fbread%2Ftoc.json#usage-streaming" rel ="nofollow noreferrer">Azure Databricks – Azure SQL DW”。

希望这可以帮助。如果您有任何进一步的疑问,请告诉我们。


请单击“标记为答案”并在对您有帮助的帖子上点赞,这可能对其他社区成员有益。

于 2019-12-24T09:42:59.193 回答