阅读 Apache Crunch 示例,该示例主要是 Java 并且对两者都很陌生。(我知道.NET)所以这里是示例代码:
DoFn<String, Pair<String, Long>> extractIPResponseSize = new DoFn<String, Pair<String, Long>>() {
transient Pattern pattern;
public void initialize() {
pattern = Pattern.compile(logRegex);
}
public void process(String line, Emitter<Pair<String, Long>> emitter) {
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
try {
Long responseSize = Long.parseLong(matcher.group(7));
String remoteAddr = matcher.group(1);
emitter.emit(Pair.of(remoteAddr, responseSize));
} catch (NumberFormatException e) {
// corrupt line, we should increment a counter
}
}
}
};
第一行让我很困惑,我无法理解,你能逐条解释吗?注意:DoFn
是 Apache Crunch 中的一个类,这里是它的文档:
http ://crunch.apache.org/apidocs/0.3.0/org/apache/crunch/DoFn.html
我也做了一些谷歌搜索,看起来Pair
这里也是一个 Apache 常见的语言:http:
//commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/tuple/对.html
也许我需要了解的是 Java 泛型?