1

I'm trying to make a database of users for a website that will store correlation values between all the users. What I mean by this is that for every pair of users, there is a stored value of correlation between the two users.

The correlation values will be calculated by PHP using a correlation algorithm. My question is what is the most correct way to store them in a MySQL database? I realize I could make a table like this:

         ---------------------------------
        | user1 | user2 | user3 | etc... |
 -----------------------------------------
| user1 | #val  | #val  | #val  | #val   |
 -----------------------------------------
| user2 | #val  | #val  | #val  | #val   |
 -----------------------------------------
| user3 | #val  | #val  | #val  | #val   |

etcetera. But I don't like this method because

  • It stores every value twice; for example the correlation between user1 and user3 is stored in row 1 column 3 as well as row 3 column 1.
  • I use prepared statements, which means I can't select columns named after user IDs unless I concatenate the user ID into the SQL statement, which is obviously not ideal.

What are my alternatives? If this can be done in MySQL well, how do I go about it?

If this can't be done well in MySQL, are there any other database types I should try to learn? For example, I realize a graph database system may work well for this, but I don't want to spend time learning how to use a graph database if this can be done in MySQL.

4

3 回答 3

1

绝妙的问题。

给定用户 A、B、C、D 和 E,您的数据集是三角形的;

  A B C D E
A   
B *  
C * *
D * * *
E * * * *
  1. 在上面的矩阵中AA、BB、CC、DD和EE是没有意义的。
  2. 为避免重复,AB 与 BA 相同。CD 与 DC 相同,以此类推。

您可以通过这种方式将三角形数据集保存在面向表的 SQL 数据库中;

id usr usr c
------------
0  A   B   1
1  A   C   5
2  A   D   3
3  A   E   4
4  B   C   3

ETC

于 2013-05-02T00:38:08.813 回答
0

在我看来,最好的解决方案是有 2 个表......用户和用户关系

用户关系:

====================================
User1Field | User2Field | ValueField
====================================
#User      | #User      | #val
------------------------------------
#User      | #User      | #val
------------------------------------
#User      | #User      | #val
------------------------------------
于 2013-05-02T00:10:26.473 回答
0

通常,您会在 JOIN 表中执行类似的操作。因此,假设您有一个users表,其中包含一个user_id字段以及您需要的任何其他字段。您可以构建一个名为user_relations或类似的表,该表仅具有两个 user_id 外键字段,它们以某种方式关联用户。

user_id_1  |   user_id_2
------------------------
1          |   2
1          |   3
2          |   1
3          |   1
...        |   ...

然后,您将在两列中都有一个复合主键以强制唯一性。请注意,我假设#val您在问题中提到的只是某种表示关系存在的标志(1/0)。如果您确实需要该值来解释有关关系的某些内容(即父/子或其他有意义的值),那么您显然可以在此表中添加第三列来存储与关系关联的值。

当您需要跨关系查询时,您可以这样做:

SELECT u1.*, u2.*
FROM
  users AS u1
  INNER JOIN user_relations AS ur
    ON u1.user_id = ur.user_id_1
  INNER JOIN users AS u2
    ON ur.user_id_2 = u2.user_id
WHERE u1.user_id = ? /* or whatever filter you may need to apply */

请注意,根据您尝试表示的关系类型(即双向关系),您可能需要表中的两行来表示每个关系。这样,您始终可以使用第一列在第二列中查找所有相关用户。这在我上面的示例中显示,其中用户一的关系在示例行值的两个方向上显示。

于 2013-05-02T00:10:35.377 回答