0

我正在查看 hadoop 框架中没有意义的生产代码。为什么我们使用瞬态,为什么我不能将实用程序方法设为静态方法(领导告知不要将 isThinger 设为静态方法)?我查了瞬态关键字,它与序列化有关。这里真的使用序列化吗?

//extending from MapReduceBase is a requirement of hadoop
public static class MyMapper extends MapReduceBase {

    // why the use of transient keyword here?
    transient Utility utility;

    public void configure(JobConf job) {

        String test = job.get("key");

        // seems silly that we have to create Utility instance.
        // can't we use a static method instead?
        utility = new Utility();

        boolean res = utility.isThinger(test);

        foo (res);
    }

    void foo (boolean a) { }
}


public class Utility {
   final String stringToSearchFor = "ineverchange";

   // it seems we could make this static.  Why can't we?
   public boolean isThinger(String word) {
      boolean val = false;
      if (word.indexOf(stringToSearchFor) > 0) {
           val = true;
      }
      return val;
   }
}
4

2 回答 2

2

The problem in your code is the difference between the local mode (dev&testcases using it usually) and the distributed mode.

In the local mode everything will be inside a single JVM, so you can safely assume that if you change a static variable (or a static method that shares some state, in your case stringToSearchFor) the change will be visible for the computation of every chunk of input.

In distributed mode, every chunk is processed in its own JVM. So if you change the state (e.G. in stringToSearchFor) this won't be visible for every other process that runs on other hosts/jvms/tasks.

This is an inconsistency that leads to the following design principles when writing map/reduce functions:

  1. Be as stateless as possible.
  2. If you need state (mutable classes for example), never declare references in the map/reduce classes static (otherwise it will behave different when testing/develop than in production)
  3. Immutable constants (for example configuration keys as String) should be defined static and final.

transient in Hadoop is pretty much useless, Hadoop is not serializing anything in the usercode (Mapper/Reducer) class/object. Only if you do something with the Java serialization which we don't know of, this will be an issue.

For your case, if the Utility is really a utility and stringToSearchFor is an immutable constant (thus not be changed ever), you can safely declare isThinger as static. And please remove that transient, if you don't do any Java serialization with your MapReduceBase.

于 2013-02-07T08:16:55.787 回答
0

除非这里没有显示一些东西,否则我怀疑制作方法的问题Utility很大程度上static归结为风格。特别是,如果您没有注入Utility实例而不是在其中按需实例化它,那么它就毫无意义。正如它所写的那样,它不能被覆盖,也不能比static方法更容易测试。

至于transient,你是对的,没有必要。如果原始开发人员在继承或实现链中的某处使用序列化,并且他们通过将不可序列化的实例变量标记为transient.

于 2013-02-07T04:39:11.760 回答