0

我正在使用 MongoDB 2.6.1

我有一个按项目存储电子邮件的集合。文件如下(为了便于阅读,没有包含“原始电子邮件文本”键):

{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d6"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:05:35 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "RE: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d7"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:02:38 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "FW: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d8"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 04:37:47 IST 2014",
        "To" : "Prachi Sutrawe; ",
        "From" : "Mahindra Shambharkar",
        "CC" : "NO VALUES",
        "Subject" : "Accepted: Show and tell -Sale",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}

在选择分片键时,我有以下想法:

  1. 建立一个复合索引 { Project_Id, _id } 因为Project_Id具有低基数但_id具有高基数
  2. ' Date ' / ' Unique_Id '上的散列索引,它们都是时间戳
  3. ' From ' 字段上的散列索引,但它的基数取决于否。参与项目的人
  4. ' To ' 和 ' CC ' 是多值键,而 ' Subject ' 具有很高的随机性,所以不确定这些键是否可以使用
  5. 虽然未在输出中列出,但“ Raw_Text ”将被不同的应用程序广泛阅读,但我不确定是否应该构建索引,甚至是否应该在此键的分片中使用!

在这种情况下,最佳分片键是什么?

4

0 回答 0