0

I want to design a model for profiles interaction, for example A <-> interact <-> B , the interaction contains common fields for A and B.
Lets say I have collection called Interactions, I have few thoughts in mind and I am looking for the best practice solution.

  1. separate the interaction to two different documents, one for each profile

    {
      pid:"A ID"
      commonField1:""
      commonField2:""
      ..
    
    }
    {
      pid:"B ID"
      commonField1:""
      commonField2:""
      ..
    }
    

    pros: fast read
    cons: each update for the common field should be performed on both of the documents

  2. maintain one document for the interaction

    {
     pids:['A ID','B ID']
     commonField1:""
     commonField2:""
     ..
    }
    

    pros: update the common field only once
    cons: tricky read

The thing is there are a lot of reading but also a lot of updating and this collection should be designed for a lot of millions of documents.

common queries in my scenarios:

  • retrieve profile interactions
  • update specific profile interaction

I am leaning to the second choice where I will be relying on Multikey index on the pids for fast document lookup and I will be enjoying in single update in each frequent change.

I have no experience in sharded collections but I have noticed Multikey index is not supported as sharding key, should it be a show stopper for the second choice?
does the reads will be fast enough with that kind of index? and are they any other choices for my use case?

your answer is highly appreciated.

4

1 回答 1

0

我认为后一种格式对于避免重复更新更有意义。

对于交互对,您应该使用复合索引而不是数组。复合索引用于_id和作为分片键(数组对任何一个都无效)。

所以文档可能看起来像:

{
    _id: { pid1: 'A', pid2: 'B' },
    commonField1: '',
    commonField2: '',
}

如果您想避免重复对,您可以按可预测的顺序对您的 ID 进行排序。例如,pid1可能总是两个值中的较小者。

默认_id索引将允许您有效地查找 (pid1,pid2) 或 (pid1) 交互,但您可能希望在{'_id.pid2': 1}.

于 2014-03-24T02:50:37.850 回答