我有一个 groovy 脚本,它使用 Mongo Java 驱动程序 mongo-java-driver-2.8.0.jar 来访问单个集合中的所有记录,更新任何与预期结构不匹配的记录。该脚本像冠军一样工作,但我不知道为什么它处理的记录比集合实际拥有的记录多。或者,更准确地说,dbCursore.hasNext() 迭代的记录多于集合实际拥有的记录。仅当脚本找到要更新的内容时才会发生这种情况。如果脚本在没有更新的情况下执行,则处理的总数是正确的。
hasNext() 是否“重新开始”或记录是否在迭代中移动(如果它们已更新)?
这是代码...
static def doIt( mongo, normalizer, isDryRun ) {
def ttlProcessed = 0
def ttlCandidates = 0
def ttlUpdated = 0
def lapCount = 0;
def lapStartTime = System.currentTimeMillis();
def db = mongo.getDB( "devices" )
DBCollection dbCollection = db.getCollection( "profiles" )
DBCursor dbCursor = dbCollection.find();
while ( dbCursor.hasNext() ) {
DBObject source = dbCursor.next();
DBObject normalized = normalizer.normalize( source )
// Only update if changed...
if ( ! ( source.equals( normalized ) ) ) {
ttlCandidates++
if ( !isDryRun ) {
BasicDBObject searchQuery = new BasicDBObject( "_id", normalized.get( "_id" ) )
WriteResult result = dbCollection.update( searchQuery, normalized, false, false, WriteConcern.SAFE );
ttlUpdated++
}
}
ttlProcessed++;
if ( ttlProcessed % 10000 == 0 ) {
printErr "split: ${lapCount}, splitElapsed: ${calcElapsed( lapStartTime) } ms, elapsed: ${calcElapsed( startTime )} ms, processed: ${ttlProcessed}, candidates: ${ttlCandidates}, updated: ${ttlUpdated}"
lapCount++
lapStartTime = System.currentTimeMillis()
}
}
printErr "split: ${lapCount}, splitElapsed: ${calcElapsed( lapStartTime) } ms, elapsed: ${calcElapsed( startTime )} ms, processed: ${ttlProcessed}, candidates: ${ttlCandidates}, updated: ${ttlUpdated}"
}
如果运行更新任何记录, ttlProcessed 如何获得高于正在处理的集合计数的值?