hive - 有没有办法改变存储为 ORC 的配置单元表中的列？

Question

一般来说，Hive 已经存在一个问题（ Is there a way to alter column type in hive table?）。该问题的答案表明可以使用 alter table change 命令更改架构

但是，如果文件存储为 ORC，这也可能吗？

score 1 · Accepted Answer

您可以将 orc 文件加载到 pyspark 中：

将数据加载到数据框中：

df = spark.read.format("orc").load("<path-of-file-in-hdfs")

在数据框上创建一个视图：

df2 = df.createOrReplaceTempView('Table')

创建一个带有操作列的新数据框：

df3 = spark.sql("select *, cast(third_column as float) as third_column,  from Table")

将数据框保存到 hdfs：

df3.write.format("orc").save("<hdfs-path-where-file-needs-to-be-saved")

score 0 · Accepted Answer

我在 ORC 表上运行了测试。可以将字符串转换为浮点列。

ALTER TABLE test_orc CHANGE third_column third_column float;

会将标记为字符串列的名为third_column 的列转换为浮点列。也可以更改列的名称。

旁注：我很好奇 ORC 的其他更改是否会产生问题。当我尝试重新排序列时遇到了异常。

ALTER TABLE test_orc CHANGE third_column third_column float AFTER first_column;

例外是：失败：执行错误，从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。表 default.test_orc 不支持重新排序列。SerDe 可能不兼容。

2 回答 2