java - 调用 Vertex.getEdgeValue() 两次后 EdgeValue 不一样

Question

我正在尝试在 giraph 中实现 Spinner 图形分区算法。在第一步中，我的程序将边添加到给定的输入图中，使其成为无向图，并且每个顶点都选择一个随机分区。（这个分区整数存储在中VertexValue）在这个初始化步骤结束时，每个顶点都会向所有输出边发送一条消息，其中包含顶点 ID (a LongWritable) 和顶点选择的分区。

这一切都很好。现在在我遇到问题的步骤中，每个顶点迭代接收到的消息并将接收到的分区保存在EdgeValue相应边的中。（在，在）VertexValue_ _VVertex<I,V,E>EdgeValueEEdge<I,E>

以下是我的代码的重要部分：

包装类：

public class EdgeValue implements Writable {
private int weight;
private int partition;
// Getters and setters for weight and partition
    public EdgeValue() {
    this.weight = -2;
    this.partition = -1;
}
// Constructors taking 1 and 2 ints and setting weight/partition to the given value

@Override
public void readFields(DataInput in) throws IOException {
    this.weight = in.readInt();
    this.partition = in.readInt();
}

@Override
public void write(DataOutput out) throws IOException {
    out.writeInt(this.weight);
    out.writeInt(this.partition);
}
}

public class SpinnerMessage implements Writable, Configurable {
private long senderId;
private int updatePartition;
public SpinnerMessage() {
    this.senderId = -1;
    this.updatePartition = -1;
}
// Constructors taking int and/or LongWritable and setting the fields
// Getters and setters for senderId and updatePartition

@Override
public void readFields(DataInput in) throws IOException {
    this.senderId = in.readLong();
    this.updatePartition = in.readInt();
}

@Override
public void write(DataOutput out) throws IOException {
    out.writeLong(this.senderId);
    out.writeInt(this.updatePartition);
}
}

上一步中的compute方法（ran是一个Random对象）：

public void compute(Vertex<LongWritable, VertexValue, EdgeValue> vertex, Iterable<LongWritable> messages) {
    int initialPartition = this.ran.nextInt(GlobalInformation.numberOfPartitions);
    vertex.getValue().setPartition(initialPartition);
    sendMessageToAllEdges(vertex, new SpinnerMessage(vertex.getId(),initialPartition));
}

compute错误发生步骤中的方法：

public void compute(Vertex<LongWritable, VertexValue, EdgeValue> vertex,Iterable<SpinnerMessage> messages) throws IOException {
for (SpinnerMessage m : messages) {
    vertex.getEdgeValue(new LongWritable(m.getSenderWritable().get())).setPartition(m.getUpdatePartition());
}
// ... some other code, e.g. initializing the amountOfNeighbors array.
// Here I get an ArrayIndexOutOfBoundsException since the partition is -1:
for (Edge<LongWritable, EdgeValue> edge : vertex.getEdges()) {
    EdgeValue curValue = edge.getValue();
    amountOfNeighbors[curValue.getPartition()] += curValue.getWeight();
}

但是，当我使用例如迭代边缘时

for(Edge<LongWritable, EdgeValue> e : vertex.getEdges())

或通过

vertex.getEdgeValue(someVertex)

然后返回的EdgeValue有权重-2和分区-1（来自标准构造函数的默认值）

我的想法可能导致错误：

getEdgeValue(new LongWritable(someLong))也许不起作用，因为它new LongWritable(someLong)与具有相同值的另一个对象不同。但是，我已经看到它在 giraph 代码中使用过，所以这似乎没有问题，只有长期存储在里面LongWritable似乎很重要。
（最可能的原因）Hadoop 序列化和反序列化以某种方式改变了我的EdgeValue对象。由于 Hadoop 用于非常大的图，它们可能不适合 RAM。为此，VertexValue必须EdgeValue实施Writable. read()然而，在在线检查了一些 giraph 代码之后，我write()以一种对我来说似乎正确的方式实现了（以相同的顺序写入和读取重要字段）。（这是我认为与问题有关的某种原因，因为EdgeValue第二次调用返回的具有标准构造函数的字段值）

我还阅读了一些文档：

E getEdgeValue(I targetVertexId) 返回具有给定目标顶点 id 的第一条边的值，如果没有这样的边，则返回 null。注意：此方法返回的边缘值对象可能会在下一次调用时失效。因此，保持对边缘值的引用几乎总是会导致不良行为。

但是，这不适用于我，因为我只有一个EdgeValue变量，对吧？

提前感谢所有花时间帮助我的人。（我正在使用 hadoop 1.2.1 和 giraph 1.2.0）

score 0 · Accepted Answer

在查看了更多 giraph 代码示例后，我找到了解决方案：该方法基本上创建了顶点Vertex.getEdgeValue()的副本。EdgeValue如果您更改它返回的对象，它不会将这些更改写回磁盘。EdgeValue要在or 中保存信息VertexValue，您必须使用setVertexValue()or setEdgeValue()。

java - 调用 Vertex.getEdgeValue() 两次后 EdgeValue 不一样

1 回答 1

Related

Reference