我在 SQL Server 中有我的数据表AC
,其结构为:
+----------+------------+-------+
| AuthorID | CoAuthorID | Year |
+----------+------------+-------+
| 677 | 901706 | 2005 |
| 677 | 901706 | 2005 |
| 677 | 901706 | 2005 |
| 1359 | 133112 | 2005 |
| 1359 | 133112 | 2005 |
| 1359 | 133112 | 2005 |
| 1359 | 266386 | 2005 |
| 1359 | 454557 | 2005 |
| 1359 | 454557 | 2005 |
| 1359 | 454557 | 2005 |
| 1359 | 534423 | 2005 |
| 1359 | 534423 | 2005 |
| 1359 | 534423 | 2005 |
| 1359 | 578338 | 2005 |
| 1359 | 721615 | 2005 |
| 1359 | 1016805 | 2005 |
| 1359 | 1016805 | 2005 |
| 1359 | 1016805 | 2005 |
| 1359 | 1361047 | 2005 |
| 1359 | 1361047 | 2005 |
| 1359 | 1361047 | 2005 |
| 1359 | 1361320 | 2005 |
| 1359 | 1361320 | 2005 |
| 1359 | 1361320 | 2005 |
| 1359 | 1395982 | 2005 |
| 1359 | 1395982 | 2005 |
| 1359 | 1395982 | 2005 |
| 1359 | 1412785 | 2005 |
| 1359 | 1412785 | 2005 |
| 1359 | 1412785 | 2005 |
| 1359 | 1412785 | 2005 |
| ... | | |
| ... | | |
+----------+------------+-------+
我必须计算给定Conditional Probability
的年度AuthorID
CoAuthorID
P(AuthorID|CoAuthorID)
=P(AuthorID ∩ CoAuthorID) / P(CoAuthorID)
而在2005
交叉口操作的年份。
最初,例如,AuthorID = 677
and CoAuthorID = 901706
,Year = 2005
我试过这个:
对于P(AuthorID)
:
SELECT COUNT(DISTINCT AuthorID) FROM AC WHERE Year = 2005
它返回390
所以P(AuthorID)
=1/390
对于P(CoAuthorID)
:
SELECT COUNT(DISTINCT CoAuthorID) FROM AC WHERE AuthorID = 677 AND Year = 2005
它返回1
所以P(CoAuthorID)
=1/1
对于P(AuthorID ∩ CoAuthorID)
:
SELECT * FROM AC WHERE AuthorID = 677 AND Year = 2005
INTERSECT
SELECT * FROM AC WHERE CoAuthorID = 901706 AND Year = 2005
它返回 1 行:
AuthorID CoAuthorID Year
----------------------------
677 901706 2005
而数据中有 3 行,这意味着AuthorID
并CoAuthorID
在数据中共存 3 次,2005
这意味着这两位作者在 2005 年共同贡献了 3 次。所以,
- 应该是什么价值
P(AuthorID ∩ CoAuthorID)
?应该是1
还是1/3
? - 其他计算也正确吗?
谢谢!