0

我看到 Hadoop 中的配置类是可写的http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html。但是,我没有看到它公开的任何可用于添加可写对象的方法(我看到很多方法来设置和获取原始类型,如 int、long)。让我们说,我有自己的可写对象,我想将它添加到我的所有映射器的配置中,并减少使用,我该怎么做?

谢谢,

文卡特

4

2 回答 2

1

The configuration is really not for passing entire objects. The configuration should be used more for setting simple parameters that are needed for the setup of the Mappers/Reducers. Think of the conf as you set the variables at the beginning of the job. If you make changes during the middle of a run to the configuration, it most likely won't be there at the end as it's not really meant to dynamically pass data.

What you are looking for if you want to pass around entire Objects between nodes is the Distributed Cache. Technically speaking these are files, but you can use standard object serialization to add them. About the Distributed Cache.

*apologies for linking different hadoop versions, their pages are a bit muddled and hard to find what you need sometimes.

于 2013-05-20T03:34:56.410 回答
1

您可以检查 HBase 源(从 HBase 0.94.6 开始)MultiTableInputFormat.setConf()类方法和适当TableMapReduceUtil的代码(例如.initTableMapperJob())。它们Scan通过配置传递对象。早期TableInputFormat.setConf()课程使用非常相似的机制。通常只有最小的属性通过配置传递,但这可能更接近你的情况。

希望它会有所帮助。

于 2013-05-25T14:42:47.853 回答