3

允许单个用户拥有数千个关系的最佳设计是什么?

(在社交网络应用程序上工作 - 如果您知道任何“通用”社交网络设计,请指出它们......会有所帮助)

看这张图——状态更新代表一个链表,而兴趣代表杂项兴趣。对于一个用户来说,这些兴趣真的会爆炸到数千个节点——这不会导致某种超级节点问题吗?

图1

数字

为这些兴趣设置一个类别或“标题”节点,然后让这两个兴趣属于该类别节点,会是更好的设计吗?我在想,当您最初处理用户节点和几个关系/标题节点时,它可能更有效,而不是可能有数千个与用户节点直接相关的节点。

示例: 图 2 用户
|
+ 兴趣+
+----- 兴趣
+----- 兴趣
+----- 等等...

并且不应该兴趣也有“子标题”类别节点,例如“书籍”,“电影”,“产品”这样的:

**FIGURE 3**
User
|
+ interests+ 
           + books+
           |      + interest
           |      + interest
           |      + interest
           + movies+<br>
                   + interest
                   + interest
                   + interest

(显然我是neo的n00b)

以下是我的问题:

  1. 哪种模型最适合高性能、可扩展、类似 facebook 的系统 - 一个没有类别,或者一个有类别?记住性能..

  2. 兴趣可能不会总是激增到数千个节点——可能是十几个或一百个——添加类别的设计是否会增加太多开销?考虑尝试寻找和你一样喜欢的朋友——添加类别会增加太多开销吗?

  3. 后面的图像 - 那些具有类别和子类别节点的图像 - 它们只是看起来更好但对性能、组织等没有任何作用吗?

  4. 除了类别节点,是否应该只有一个类别属性来描述它所在的类别?在索引上添加具有类别属性的节点是否与具有类别节点一样好?

  5. In regards to question 4, would adding nodes with categories on an index be a better solution?

  6. What are the disadvantages to this type of structure? Are they any real advantages?

4

1 回答 1

2

I think that the interest categories are a good idea when your interests blow up to hundreds of thousands or millions of connections, if it is only a few thousand it should still work good enough. Perhaps that's even something you can evolve your user-nodes to when you actually need it. (Like a different handling of superstars on twitter).

It all depends also on your use-cases, what kinds of queries would you want to answer with the model, would those be limited to the categories or always query across all categories to the interests below?

Something you always have to take into account, is that the number of relationships touched will grow exponentially with each step you traverse out into the graph. So be aware that if you query from a user to all its friends or friends of friends and all their interests, the number of elements touched grows pretty quickly. Make sure your server has enough memory to keep large enough portions of the graph in memory to answer your requests quickly.

And make sure to do performance and load tests early on (e.g. with a data-generator).

Btw. to filter eagerly it might even be sensible to have a distinct relationship-type per interest, so that you can filter early on without actually following the rels that you are not interested in.

索引通常有助于全局类别,您可以使用名称和用户 ID 为您的类别编制索引,但随后您的用户时间类别索引条目也可以快速增长。

我认为,如果您的用例真的是针对每个类别而不是针对所有兴趣(尤其是针对所有用户和所有兴趣),那么类别方法应该可以很好地扩展。

于 2013-03-29T19:08:17.287 回答