1

我正在尝试使用 Hive 读取存储在 Cassandra 文件系统 (CFS) 中的固定宽度文本文件。当我从配置单元客户端运行时,我能够查询文件。但是,当我尝试从 Hadoop Hive JDBC 运行时,它说表不可用或连接错误。以下是我遵循的步骤。

输入文件(employees.dat):

21736Ambalavanar              Thirugnanam              BOY-EAG       2005-05-091992-11-18
21737Anand                    Jeyamani                 BOY-AST       2005-05-091985-02-12
31123Muthukumar               Rajendran                BOY-EES       2009-08-121983-02-23

启动 Hive 客户端

bash-3.2# dse hive;
Logging initialized using configuration in file:/etc/dse/hive/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201209250900_157600446.txt
hive> use HiveDB;
OK
Time taken: 1.149 seconds

创建指向固定宽度格式文本文件的 Hive 外部表

hive> CREATE EXTERNAL TABLE employees (empid STRING, firstname STRING, lastname STRING, dept STRING, dateofjoining STRING, dateofbirth STRING)
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    > WITH SERDEPROPERTIES ("input.regex" = "(.{5})(.{25})(.{25})(.{15})(.{10})(.{10}).*" )
    > LOCATION 'cfs://hostname:9160/folder/';
OK
Time taken: 0.524 seconds

从表中选择 *。

hive> select * from employees;
OK
21736    Ambalavanar                     Thirugnanam                     BOY-EAG        2005-05-09      1992-11-18
21737    Anand                           Jeyamani                        BOY-AST        2005-05-09      1985-02-12
31123    Muthukumar                      Rajendran                       BOY-EES        2009-08-12      1983-02-23
Time taken: 0.698 seconds

使用配置单元表中的特定字段进行选择会引发权限错误(第一个问题)

hive> select empid, firstname from employees;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------
        at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Job Submission failed with exception 'java.io.IOException(The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

第二个问题是,当我尝试从 JDBC Hive 驱动程序(在 dse/cassandra 节点之外)运行 select * 查询时,它说员工表不可用。创建的外部表就像一个临时表,它不会被持久化。当我使用 'hive> show tables;' 时,不会列出员工表。谁能帮我解决问题?

4

1 回答 1

3

对于第一个问题,我没有直接的答案,但第二个问题看起来像是由于一个已知问题。

DSE 2.1 中有一个错误,它会在运行 show tables 时从元存储中删除从 CFS 文件创建的外部表。仅删除表元数据,数据保留在 CFS 中,因此如果您重新创建表定义,则不必重新加载它。Cassandra ColumnFamilies 支持的表不受此错误的影响。这已在 DSE 的 2.2 版本中得到修复,该版本即将发布。

我不熟悉 Hive JDBC 驱动程序,但如果它在任何时候发出 Show Tables 命令,它可能会触发这个错误。

于 2012-09-25T15:30:56.317 回答