45

I run hive query by java code. Example:

"SELECT * FROM table WHERE id > 100"

How to export result to hdfs file.

4

11 回答 11

67

以下查询会将结果直接插入 HDFS:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;
于 2013-01-12T03:25:39.263 回答
39

此命令会将输出重定向到您选择的文本文件:

$hive -e "select * from table where id > 10" > ~/sample_output.txt
于 2014-04-08T23:46:12.687 回答
26

这会将结果放在目录下的制表符分隔文件中:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;
于 2015-04-30T02:54:19.940 回答
2

我同意 tnguyen80 的回应。请注意,当查询中有特定的字符串值时,最好用双引号给出整个查询。

例如:

$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
于 2015-07-23T10:25:00.570 回答
2

@sarath如果我想从不同的表运行另一个select *命令并写入同一个文件,如何覆盖文件?

INSERT OVERWRITE LOCAL DIRECTORY '/home/training/mydata/outputs' SELECT expl , count(expl) as total
FROM ( SELECT explode(splits) as expl FROM ( SELECT split(words,' ') as splits FROM wordcount ) t2 ) t3 GROUP BY expl ;

这是sarath问题的一个例子

以上是存储在本地目录中的输出文件中的字数统计作业:)

于 2017-05-04T10:58:16.307 回答
2

理想的方法是使用 "INSERT OVERWRITE DIRECTORY '/pathtofile' select * from temp where id > 100" 而不是 "hive -e 'select * from...' > /filepath.txt"

于 2017-04-23T18:06:56.627 回答
1
  1. 创建外部表
  2. 向表中插入数据
  3. 可选稍后删除表,这不会删除该文件,因为它是一个外部表

例子:

创建外部表以将查询结果存储在'/user/myName/projectA_additionaData/'

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

将查询结果输入临时表

 insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

删除临时表

drop table additionaData
于 2019-03-19T20:19:36.487 回答
1

两种方式可以存储 HQL 查询结果:

  1. 保存到 HDFS 位置
INSERT OVERWRITE DIRECTORY "HDFS Path" ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
SELECT * FROM XXXX LIMIT 10;
  1. 保存到本地文件
$hive  -e "select * from table_Name" > ~/sample_output.txt
$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv
于 2020-02-15T15:46:22.507 回答
1

要将文件直接保存在 HDFS 中,请使用以下命令:

hive> insert overwrite  directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;

这会将内容放在 HDFS 中的文件夹 /user/cloudera/Sample 中。

于 2017-09-12T09:08:05.847 回答
0

在 Hive 命令行界面中输入这一行:

insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery;

testViewQuery- 一些具体的看法

于 2017-11-22T14:13:49.903 回答
0

要设置输出目录和输出文件格式等,请尝试以下操作:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...

例子:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;
于 2019-04-25T20:49:27.663 回答