我需要同时处理多个月的数据。那么,是否可以选择将多个文件夹指向外部表?例如 Create external table logdata(col1 string, col2 string........) location s3://logdata/april, s3://logdata/march


简单的回答:不,在创建过程location中 Hiveexternal表的 必须是唯一的,元存储需要它来了解您的表所在的位置。



create external table logdata(col1 string, col2 string) partitioned by (month string) location 's3://logdata'


alter table logdata add partition(month='april') location 's3://logdata/april'

您每个月都这样做,现在您可以查询您的表,指定您想要的任何分区,Hive 只会查看您实际需要数据的目录(例如,如果您只处理 4 月和 6 月,Hive 不会负载可能)

于 2013-06-03T16:06:36.763 回答


hive> create external table xxx (uid int, name string, dept string) row format delimited fields terminated by '\t' stored as textfile;
hive> load data inpath '/input/tmp/user_bckt' into table xxx;
hive> load data inpath '/input/user_bckt' into table xxx;
hive> select count(*) from xxx;
hive> select * from xxx;
1   ankur   abinitio
2   lokesh  cloud
3   yadav   network
4   sahu    td
5   ankit   data
1   ankur   abinitio
2   lokesh  cloud
3   yadav   network
4   sahu    td
5   ankit   data



hduser@hadoopnn:~$ hls /input/tmp
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

14/10/05 14:47:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 hduser hadoop         93 2014-10-04 18:54 /input/tmp/dept_bckt
-rw-r--r--   1 hduser hadoop         71 2014-10-04 18:54 /input/tmp/user_bckt
hduser@hadoopnn:~$ hcp /input/tmp/user_bckt /input/user_bckt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

14/10/05 14:47:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hduser@hadoopnn:~$ logout
Connection to nn closed.
hduser@hadoopdn2:~$ hls /input/tmp/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

14/10/05 15:05:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   1 hduser hadoop         93 2014-10-04 18:54 /input/tmp/dept_bckt
hduser@hadoopdn2:~$ hls /hive/wh/xxx
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

14/10/05 15:21:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 hduser hadoop         71 2014-10-04 18:54 /hive/wh/xxx/user_bckt
-rw-r--r--   1 hduser hadoop         71 2014-10-05 14:47 /hive/wh/xxx/user_bckt_copy_1


于 2014-10-05T09:33:54.103 回答


示例: 1. 改变表格的位置如下。我进入了两个以':'分隔的hdfs目录,也尝试了','和';'。它是成功的。

hive> alter table ext set location 'hdfs:///solytr:/ext';
Time taken: 0.086 seconds
  1. 但是,当查询该表时,它导致失败。

OK 失败并出现异常 java.io.IOException:java.lang.IllegalArgumentException: 来自 hdfs:/solytr:/ext 的路径名 /solytr:/ext 不是有效的 DFS 文件名。
耗时:0.057 秒

于 2014-10-15T17:29:32.383 回答

看看 SymlinkTextInputFormat / https://issues.apache.org/jira/browse/HIVE-1272。认为这可以解决您的问题。只需要维护一个包含所有位置的单独文本文件!


于 2013-06-19T06:37:17.660 回答