0

我使用 serde 将 csv 文件加载到 hive 表中。像往常一样,它将所有列类型创建为字符串。但是当我尝试将列转换为它们各自的数据类型时,它会抛出一个错误,尤其是在将字符串类型转换为数组类型时。

describe table ted; 
comments string from deserializer
description string from deserializer
duration string from deserializer
speaker string from deserializer
occupation string from deserializer
tags string from deserializer
views string from deserializer

create table tedx as select cast(comments as int) as comments, cast(description as string) as desc, cast(duration as int) as duration, cast(speaker as string) as speaker, cast(occupation as string) as occupation, cast(tags as array) as tags, cast(views as int) as views, from ted;

失败:ParseException line 7:13 无法识别原始类型规范中“array”“<”“string”附近的输入

如何将标签列从字符串类型转换为数组类型?

4

1 回答 1

0

要将字符串转换为数组,请使用 (string str, string pat) - 围绕 pat 拆分 str(pat 是正则表达式)。

演示:

hive> select split('1,2,3',',');
OK
["1","2","3"]
Time taken: 4.691 seconds, Fetched: 1 row(s)

文档在这里:https ://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

于 2018-04-04T12:18:19.107 回答