hadoop - 将文件样本从hdfs复制到本地fs？

Question

好的，

一个非常愚蠢的问题...

我在 hdfs 中有一个大文件

/user/input/foo.txt

我想从这个位置复制前 100 行到本地文件系统...

而且数据非常敏感，所以我对实验有点犹豫。

将样本数据从 hdfs 复制到本地 fs 的正确方法是什么？

score 4 · Accepted Answer

如果文件未压缩：

bin/hadoop fs -cat /path/to/file |head -100 > /path/to/local/file

如果文件被压缩：

bin/hadoop fs -text /path/to/file |head -100 > /path/to/local/file

score 2 · Accepted Answer

这是确保胜利的简单方法：

hdfs dfs -copyToLocal /user/input/foo.txt /path/to/local/file | head -100

score 1 · Accepted Answer

您可以使用该head程序从文件开头提取几行，例如：

$ head /user/input/foo.txt -n100

（其中n确定要提取的行数），并将输出重定向到您选择的文件：

$ head /user/input/foo.txt -n100 > /path/to/you/output/file

3 回答 3