哎呀。有时left-join比cogroup更好。这创造了一袋袋子:
final_output = cogroup final_output by nodeid, new_edges by nodeid1;
final_output = foreach final_output generate flatten($1.hash_id) as hash_id, new_edges;
final_output = cogroup nodes_per_entity by hash_id, final_output by hash_id;
所以我将其更改为:
final_output = join final_output by nodeid left, new_edges by nodeid1;
final_output = group final_output by hash_id;
final_output = foreach final_output generate group as hash_id,
final_output.(nodeid1, nodeid2, edge_pit, asset_id1, asset_id2) as new_edges;
现在它是一个普通的元组包(尽管其中一些由于左连接而具有空值,但这更容易处理)。