总结聊天室的后续情况,这个问题实际上与 find() 查询有关,该查询正在扫描所有约 500k 文档以找到 15 个:
db.tweet_data.find({
$or:
[
{ in_reply_to_screen_name: /^kunalnayyar$/i, handle: /^kaleycuoco$/i, id: { $gt: 0 } },
{ in_reply_to_screen_name: /^kaleycuoco$/i, handle: /^kunalnayyar$/i, id: { $gt: 0 } }
],
in_reply_to_status_id_str: { $ne: null }
} ).explain()
{
"cursor" : "BtreeCursor id_1",
"nscanned" : 523248,
"nscannedObjects" : 523248,
"n" : 15,
"millis" : 23682,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"id" : [
[
0,
1.7976931348623157e+308
]
]
}
}
This query is using case-insensitive regular expressions which won't make efficient use of an index (though there wasn't actually one defined, in this case).
Suggested approach:
create lowercase handle_lc
and inreply_lc
fields for search purposes
add a compound index on those:
db.tweet.ensureIndex({handle_lc:1, inreply_lc:1})
the order of the compound index allows efficient find of all tweets either by handle
or by (handle,in_reply_to
)
search by exact match instead of regex:
db.tweet_data.find({
$or:
[
{ in_reply_to_screen_name:'kunalnayyar', handle:'kaleycuoco', id: { $gt: 0 } },
{ in_reply_to_screen_name:'kaleycuoco', handle:'kunalnayyar', id: { $gt: 0 } }
],
})