我想更改 Databricks Delta 表的列名。
所以我做了以下事情:
// Read old table data
val old_data_DF = spark.read.format("delta")
.load("dbfs:/mnt/main/sales")
// Created a new DF with a renamed column
val new_data_DF = old_data_DF
.withColumnRenamed("column_a", "metric1")
.select("*")
// Dropped and recereated the Delta files location
dbutils.fs.rm("dbfs:/mnt/main/sales", true)
dbutils.fs.mkdirs("dbfs:/mnt/main/sales")
// Trying to write the new DF to the location
new_data_DF.write
.format("delta")
.partitionBy("sale_date_partition")
.save("dbfs:/mnt/main/sales")
在写信给 Delta 时,我在最后一步遇到错误:
java.io.FileNotFoundException: dbfs:/mnt/main/sales/sale_date_partition=2019-04-29/part-00000-769.c000.snappy.parquet
A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement
显然数据已被删除,很可能我错过了上述逻辑中的某些内容。现在唯一包含数据的地方是new_data_DF
. 写入类似的位置dbfs:/mnt/main/sales_tmp
也会失败
我应该怎么做才能将数据从new_data_DF
Delta 位置写入?