任何人都可以帮助我了解 Hadoop 灾难恢复吗?
我应该将数据从集群复制到另一个集群作为备份使用 distcp 吗?或者我可以使用 copyToLocal 将我的数据复制到我的本地机器?
有人知道吗?
DRP plan goes beyond just the technology and the requirements can greatly affect the solution.
for instance if you can't afford to lose any data you'd want an active/active setup and send data to two hadoop clusters simultaneously. on the other side of the spectrum hadoop's replication (default is 3 copies but you can change that) and rack awareness can give you a copy on a secondary rack. In between you can use things like distcp that you mention to copy data from cluster to cluster.
Additionally you might want to follow project falcon which is a new initiative for hadoop data life-cycle management