我有这个非常基本的测试(在安装 hadoop 2.7 和 pig 0.14 之后立即)
该文件存在于 hdfs -
hdfs://master:50070/user/raghav/family<r 2> 32
hdfs://master:50070/user/raghav/nsedata <dir>
但是,当我运行以下命令时,
A = LOAD 'family';
dump A;
我收到以下错误消息 -
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.0 0.14.0 raghav 2015-05-19 21:38:35 2015-05-19 21:38:41 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1432066972596_0002 A MAP_ONLY Message: Job failed! hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833,
Input(s):
Failed to read data from "hdfs://master:50070/user/raghav/family"
Output(s):
Failed to produce result in "hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833"
进一步的调查揭示了更多信息。如前所述,我可以看到 hdfs 上的文件(从 pig 中通过 ls 命令),也可以从 shell 提示符使用 hadoop fs 命令。但是,pig 和 hive 都无法看到 hdfs 上的文件。
我还尝试使用线虫端口(尝试了不同的值 8020、9000、50070),但行为保持不变。我也尝试查看线虫和数据节点日志,但找不到更多...
需要认真的帮助!
一些问题的答案
myhost raghav$ hdfs dfs -ls /user/raghav/family
15/05/20 08:03:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
myhost raghav$ hdfs dfs -ls /user/raghav/
15/05/20 08:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
drwxr-xr-x - raghav supergroup 0 2015-05-15 00:25 /user/raghav/nsedata
myhost raghav$ hadoop fs -ls /
15/05/20 08:04:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - raghav supergroup 0 2015-05-19 23:06 /tmp
drwxr-xr-x - raghav supergroup 0 2015-05-20 07:30 /user
myhost raghav$
进一步的测试表明,hive 能够使用 hdfs,但 pig 仍然不能。我可以在 hive 中创建一个外部表,成功指向示例文件“family”
create external table xfamily(name STRING, age INT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION '/user/raghav';
OK
Time taken: 0.023 seconds
hive> select * from xfamily;
xxxxxx - expected data shows up.