0

I would like to have an arrayList that holds reference to object inside the reduce function.

@Override
public void reduce( final Text pKey,
                    final Iterable<BSONWritable> pValues,
                    final Context pContext )
        throws IOException, InterruptedException{
    final ArrayList<BSONWritable> bsonObjects = new ArrayList<BSONWritable>();

    for ( final BSONWritable value : pValues ){
        bsonObjects.add(value);
        //do some calculations.
    }
   for ( final BSONWritable value : bsonObjects ){
       //do something else.
   }
   }

The problem is that the bsonObjects.size() returns the correct number of elements but all the elements of the list are equal to the last inserted element. e.g. if the

{id:1}

{id:2}

{id:3}

elements are to be inserted the bsonObjects will hold 3 items but all of them will be {id:3}. Is there a problem with this approach? any idea why this happens? I have tried to change the List to a Map but then only one element was added to the map. Also I have tried to change the declaration of the bsonObject to global but the same behavior happes.

4

1 回答 1

2

这是记录在案的行为。原因是 pValues 迭代器重用了 BSONWritable 实例,并且当它在循环中的值发生变化时,bsonObjects ArrayList 中的所有引用也会更新。当您在 bsonObjects 上调用 add() 时,您正在存储一个引用。这种方法允许 Hadoop 节省内存。

您应该在第一个循环中实例化一个新的 BSONWritable 变量,该变量等于变量值(深拷贝)。然后将新变量添加到 bsonObjects 中。

尝试这个:

for ( final BSONWritable value : pValues ){
    BSONWritable v = value; 
    bsonObjects.add(v);
    //do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
   //do something else.
}

然后,您将能够在第二个循环中遍历 bsonObjects 并检索每个不同的值。

但是,您也应该小心——如果您进行深度复制,此 reducer 中的所有键值都需要适合内存。

于 2012-06-12T22:01:09.397 回答