我有一个PTable<String, Pair<Entity1, Entity2>>
在我正在运行转换作业的程序的中间阶段生成的。
示例 PTable 条目:
["0067b4c054d14fe2-ACC8D37",
[{
"unique_id": "0067b4c054d14fe2-ACC8D37",
"user_id": "ACC8D37",
"campaign_id": "3URL6GC2",
"seller_id": "0067b4c"
}, {
"unique_id": "0067b4c054d14fe2-ACC8D37",
"user_id": "ACC8D37",
"seller_id": "0067b4c"
}]]
我需要找到一个PCollection<String>
地方Entity2 is null
变压器自由度
private PCollection<String> transformPairTable(PCollection<Pair<Entity1, Entity2>> pairPCollection) {
return pairPCollection.parallelDo(new DoFn<Pair<Entity1, Entity2>, String>() {
@Override
public void process(Pair<Entity1, Entity2> entityPair, Emitter<String> emitter) {
Entity1 entity1 = entityPair.first();
Entity2 entity2 = entityPair.second();
if (entity2 == null) {
String campaignId = entity1.getCampaignId();
emitter.emit(campaignId);
}
}
}, Avros.strings());
}
Entity1 和 Entity2 都是从 Avro 模式生成的类。
但是当我运行作业时,它会引发运行时异常
Exception in thread "main" org.apache.crunch.CrunchRuntimeException: java.io.NotSerializableException: com.xxx.bb.data.vv.jobs.GenerateCount
at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:140)
at org.apache.crunch.impl.mr.MRPipeline.runAsync(MRPipeline.java:159)
at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:147)
at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:94)
at org.apache.crunch.materialize.pobject.FirstElementPObject.process(FirstElementPObject.java:44)
at org.apache.crunch.materialize.pobject.PObjectImpl.getValue(PObjectImpl.java:71)
at com.xxxx.bb.xxxx.xxx.jobs.GenerateNewCount.getNewCustomersPCollection(GenerateNewCount.java:239)
at com.xxx.x.data.x.jobs.GenerateNewCount.runJob(GenerateNewCount.java:78)
at com.xxx.x.data.xx.jobs.BaseJob.run(BaseJob.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
我试过 Writables.strings() 但它给出了同样的例外。
作业中使用的 PTable 具有 Entity2 等于 null 的条目。
我尝试以多种方式转换 PTable,但它不起作用。我无法弄清楚其背后的主要原因。
当我使用
Entity1 p1= pairPCollection.first().getValue().first();
它抛出下面提到的异常:
ERROR materialize.MaterializableIterable: Could not materialize: Avro(/tmp/crunch-1893789994/p7)
java.io.IOException: No files found to materialize at: /tmp/crunch-1893789994/p7
at org.apache.crunch.io.CompositePathIterable.create(CompositePathIterable.java:49)
at org.apache.crunch.io.impl.FileSourceImpl.read(FileSourceImpl.java:134)
at org.apache.crunch.io.avro.AvroFileSource.read(AvroFileSource.java:89)