我正在使用 MongoDB 2.6.1
我有一个按项目存储电子邮件的集合。文件如下(为了便于阅读,没有包含“原始电子邮件文本”键):
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d6"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 05:05:35 IST 2014",
"To" : "manisha.bhopate@infostretch.com; ",
"From" : "Shubhangi Thorat",
"CC" : "NO VALUES",
"Subject" : "RE: pics",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d7"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 05:02:38 IST 2014",
"To" : "manisha.bhopate@infostretch.com; ",
"From" : "Shubhangi Thorat",
"CC" : "NO VALUES",
"Subject" : "FW: pics",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d8"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 04:37:47 IST 2014",
"To" : "Prachi Sutrawe; ",
"From" : "Mahindra Shambharkar",
"CC" : "NO VALUES",
"Subject" : "Accepted: Show and tell -Sale",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
在选择分片键时,我有以下想法:
- 建立一个复合索引 { Project_Id, _id } 因为Project_Id具有低基数但_id具有高基数
- ' Date ' / ' Unique_Id '上的散列索引,它们都是时间戳
- ' From ' 字段上的散列索引,但它的基数取决于否。参与项目的人
- ' To ' 和 ' CC ' 是多值键,而 ' Subject ' 具有很高的随机性,所以不确定这些键是否可以使用
- 虽然未在输出中列出,但“ Raw_Text ”将被不同的应用程序广泛阅读,但我不确定是否应该构建索引,甚至是否应该在此键的分片中使用!
在这种情况下,最佳分片键是什么?