我正在尝试构建一个 2x2 列联表,如下面的链接中所述:
即席 2x2 列联表 SQL Server 2008 (试图理解代码但无法理解它)
循环构造成对,如 C1,C1 C1,C2 C2,C1 C2,C2。(笛卡尔积)
这些对作为参数提供给 sql 代码。对于这个例子,我已经给了 sql 代码一对 --> C1,C1
当为不同的对构造它时,它们是正确的,如 C1、C2、C2、C1(经过下面解释的一些修改)。当制作成对的 C1,C1 或 C2,C2 时,它会构造一个错误的列联表。
例如(表名是 alpha_occurence):
id concept_uri document_uri
1 C1 D1
2 C2 D1
C1,C1 对的 2x2 列联表应从上表给出:
C1 not C1
C1 1 0
not C1 0 -
而是给出(经过一些修改):
C1 not C1
C1 0 1
not C1 1 -
请注意,我已将 - 用于值不是 C1,不是 C1。因为要计算使用了另一种方法。
此 sql 代码用于检索值:
SELECT count(*) AS total FROM
(SELECT document_uri,count(DISTINCT concept_uri) AS count_conc FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1','C1')
GROUP BY document_uri
HAVING count_conc >=2 )
AS amount_of_concept_co_occurence #value of both X and Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_A #value of Only X not Y
UNION ALL
SELECT count(*) AS total FROM
(SELECT concept_uri,document_uri FROM mydb.alpha_occurence
WHERE concept_uri IN ('C1'))
AS only_concept_B #value of Not X only Y
检索到值后,将在这些值上运行一个小脚本以更正它们。完成以下操作:
To get Only X and not Y = only_concept_A - amount_of_concept_co_occurence
To get Not X and Only Y = Only_concept_B - amount_of_concept_co_occurence
To get the value of neither X or Y = total # of documents (which is not given here as the sample data only has data of which concept occurce in which document) - (amount_of_concept_co_occurence + Only X and not Y + Not X and Only Y)