java - When copying a file to HDFS, how to control what nodes that file will reside on?

Question

I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc. When copying a file to HDFS, is there a way to control which machines that file will reside on? I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A". I don't really care about the other two machines -- they could be any machines on my cluster.

Thank you.

score 0 · Accepted Answer

我不这么认为，因为通常当文件大于 64MB（块大小）时，文件块的主副本将驻留在多个服务器上。

score 0 · Accepted Answer

HDFS 是一个分布式文件系统，而 HDFS 是特定于集群（一台机器或多台机器）的，一旦文件位于 HDFS，您就会失去下面的机器或机器概念。正是这种抽象使它成为最佳用例。如果文件大小大于复制块大小，则文件将被切割成块大小并根据复制因子，这些块将被复制到集群中的其他机器。这些块移动基于

在您的情况下，如果您有 3 个节点集群（+1 个主名称节点），您的源文件大小为 1 MB，您的复制大小为 64MB，复制因子为 3，那么您将在所有 3 个节点中拥有 3 个块副本，包括您的 1MB 文件，但是从 HDFS 的角度来看，您仍然只有 1 个文件。一旦文件复制到 HDFS，你真的不考虑机器因素，因为在机器级别没有文件，它是文件块。

如果您真的想确保出于某种原因，您可以将复制因子设置为 1 并拥有 1 个节点集群，这将保证您的奇怪要求。

最后，您始终可以在 Hadoop 集群中使用 FSimage 查看器工具来查看文件块的位置。更多详细信息位于此处。

score 0 · Accepted Answer

0

我最近发现这可能会解决您想要做的事情：控制 HDFS 块放置

于 2013-04-11T12:18:45.693 回答

java - When copying a file to HDFS, how to control what nodes that file will reside on?

3 回答 3

Related

Reference