2

我在一篇文章中看到了以下内容,这家互联网技术公司谈到了他们如何将社交功能融入到他们的应用程序中:

Apache Thrift、Krati Data Store、JavaEWAH Compressed Bitmaps 和 JRuby 构成了我们远程服务的一部分,它以高性能持久压缩位图格式存储我们的社交图。

我试图理解这一点。到目前为止,我已经弄清楚了 Apache Thift 的含义(以及为什么要使用它)、JavaEWAH、位集、社交图和 GUI 分析。Krati 数据源本身似乎没有一个好的 wiki/教程。此外,我无法理解设置,关于如何使用位集和上述技术存储和处理社交图。

如果你能解释一下并引导我找到相关资源。或者,如果您可以提出更好的替代方案来替代所描述的堆栈。

4

1 回答 1

1

Ok, let's put some basics upfront:

I guess, your article is that one: http://www.nextbigwhat.com/technology-implementation-for-social-features-297/

http://en.wikipedia.org/wiki/Social_graph 'The social graph in the Internet context is a graph that depicts personal relations of internet users'

http://thrift.apache.org/ combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

https://github.com/krati/krati Krati is a simple persistent data store with very low latency and high throughput. It is designed for easy integration with read-write-intensive applications with little effort in tuning configuration, performance and JVM garbage collection.

http://code.google.com/p/javaewah/ The bit array data structure is implemented in Java as the BitSet class.... JavaEWAH is a word-aligned compressed variant of the Java bitset class.

http://jruby.org/apidocs/serialized-form.html ....

----- Here's are my interpretations:

The context of the article is technology impementation. So they listed everything. In this context I guess we can ignore apache Thrift for now, as this is just the glue, which they use to attach technologies to each other. Also jrubi forms goes somewhat out of scope for the social graph considerations. Yes a social graph needs input and output, but forms addresses the topic which level of details comes from there.

The interesting part is krati and javaewah. Well reading the article makes obvious, that they implement their social graphs via memberships. This can be about groups or roles or something similar. Memberships can be implemented as bitmap: Have a group with a bitmap with one bit per each user. Each Bit can be addressed to check if the user is member or not. As simple as that. The Bitmaps are made up by Krati and than stored in/managed by JavaEWAH. The cons is: The more users, the bigger does the bitmap go. The Pro: It is FAST.

In relational databases each relation would be implemented as foreign key 2 foreign key pair (which causes some index overhead >eg. 2 ints for the keys and then 2*2+x ints for the double index, whereby the x debends on the database). Especially with lots of memberships per group this can get a disk space utilization challenge. So I guess in such cases the compressed BitMap is implementation is even better in terms of storage utilization.

UPDATE---

One could write books on the whole topic. I guess I need to make a point here. However good starting points from here are:

http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

https://github.com/jingwei/krati/commit/ab1432003e59a07269d23c1cb307625b0e8c5be2

http://en.wikipedia.org/wiki/Data_store http://en.wikipedia.org/wiki/Key-value_store (to get an idea about different database concepts than just the relative one)

http://dev.mysql.com/doc/refman/5.0/en/innodb-physical-record.html (to get some indication what about the costs of a foreign key 2 foreign key relation)

于 2013-10-17T15:28:48.753 回答