0

我有一个向量集合集合的大小

print vectors.count()

102020

当我遍历字段时

start = time.time()
for v in vectors.find({},{'vector' : 1, '_id' : 0}):
    pass
print "total time:" , end-start

总时间:5.05100011826

but when I run with explain() I see that the query takes substantially less time.

print vectors.find({},{'vector' : 1, '_id' : 0}).explain()

{u'nYields': 0, u'allPlans': [{u'cursor': u'BasicCursor', u'indexBounds': {}}], u'nChunkSkips': 0, u'millis': 23, u'n': 102020, u'cursor': u'BasicCursor', u'indexBounds': {}, u'nscannedObjects': 102020, u'isMultiKey': False, u'indexOnly': False, u'nscanned': 102020}

Why is there such a huge time difference? Is there anyway to speed this up? I loaded all of the vectors to a sql DB text field and the same query was less than one second. Thanks

4

3 回答 3

1

我的猜测是,第二个仅向您展示 mongoDB 实际执行“查找”需要多快,而前者还涉及将每条记录检索到控制台并处理它们。

于 2012-08-01T10:42:52.553 回答
0

You might want to play with batch_size to improve the speed and reduce the amount of network hops when iterating through results.

start = time.time()
for v in vectors.find({},{'vector' : 1, '_id' : 0}).batch_size(1000):
    pass
print "total time:" , end-start
于 2012-08-01T10:32:25.243 回答
0

您可以为要查询的字段提供索引,在您的情况下,它是"vector"

vectors.createIndex({"vector":1},{sparse:true})

然后您可以查看查询时间。

于 2016-07-04T10:34:22.483 回答