0

我在 MySQL 中有两个表,我将它们与以下属性进行比较:

tbl_fac : facility_id, chemical_id, criteria
             10      , 25         , 50
             10      , 26         , 60
             10      , 27         , 60
             11      , 25         , 30
             11      , 27         , 31 
              etc...

tbl_samp: sample_id, chemical_id, result
            5     ,    25         , 51
            5     ,    26         , 61
            6     ,    25         , 51
            6     ,    26         , 61
            6     ,    27         , 500

              etc.... 

这些表由 Chemical_id(多对多---- 呃)连接,并且有几千个 facility_id,每个 facility_id 有几百个 chemical_id。还有几千个sample_id,每个sample_id 都有几百个chemic_id。总而言之,tbl_fac 中有大约 500,000 条记录,tbl_samp 中有 1,000,000 多条记录。

我正在尝试从此数据集中提取三组 sample_id:

第 1 组:任何 sample_id,其中 tbl_samp.result > tbl_fac.criteria(即,结果超出标准)

第 2 组:tbl_samp.result < tbl_fac.criteria 的任何 sample_id,并且该 sample_id 存在所有 tbl_fac.chemical_id(即,结果小于标准,并且一切都在那里)

第 3 组:任何 tbl_samp.result < tbl_fac.criteria 的 sample_id,但 sample_id 中缺少一个或多个 tbl_fac.chemical_id(即,结果小于标准,但缺少某些内容)

问题是:如何在一个查询中有效地获取所有三个组?

我试过了:

select * 
from tbl_fac 
left join tbl_samp 
    on tbl_fac.chemical_id = tbl_samp.chemical_id

但这只会产生整个数据集(而不是单个样本)缺失的值。我有一个 hackish 查询工作,它使用第三个表来连接 tbl_fac 和 tbl_samp,但它是如此丑陋,我实际上很尴尬地发布它......

与往常一样,非常感谢您对此的想法!

干杯,

乔什

编辑:理想情况下,我希望返回 sample_id 和 Group —— 每个样本 ID 只有一个 Group (我对数据的了解表明它们将始终属于上述三个类别之一)。

4

2 回答 2

1

这个答案假设在和 in 上存在唯一约束,facility_id并且在和chemical_idin上存在tbl_fac唯一约束。我所做的是一次一步地建立查询。这是否有效还有待观察。sample_idchemical_idtbl_samp

第 1 组:任何 sample_id,其中 tbl_samp.result > tbl_fac.criteria(即,结果超出标准)

SELECT tbl_samp.sample_id,
       'ResultsGreaterThanCriteria' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result > tbl_fac.criteria
GROUP  BY tbl_samp.sample_id

第 2 组:tbl_samp.result < tbl_fac.criteria 的任何 sample_id,并且该 sample_id 存在所有 tbl_fac.chemical_id(即,结果小于标准,并且一切都在那里)

SELECT tbl_samp.sample_id,
       'ResultLessThanCriteriaAndAllChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND NOT EXISTS (SELECT *
                       FROM   tbl_fac tf
                              LEFT JOIN tbl_samp ts
                                ON tf.chemical_id = ts.chemical_id
                       WHERE  ts.chemical_id IS NULL
                              AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id

第 3 组:任何 tbl_samp.result < tbl_fac.criteria 的 sample_id,但 sample_id 中缺少一个或多个 tbl_fac.chemical_id(即,结果小于标准,但缺少某些内容)

SELECT tbl_samp.sample_id,
       'ResultsLessThanCriteriaWithMissingChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND EXISTS (SELECT *
                   FROM   tbl_fac tf
                          LEFT JOIN tbl_samp ts
                            ON tf.chemical_id = ts.chemical_id
                   WHERE  ts.chemical_id IS NULL
                          AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id 

最后,将所有三个查询合并在一起,得到:

SELECT tbl_samp.sample_id,
       'ResultsGreaterThanCriteria' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result > tbl_fac.criteria
GROUP  BY tbl_samp.sample_id
UNION ALL
SELECT tbl_samp.sample_id,
       'ResultLessThanCriteriaAndAllChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND NOT EXISTS (SELECT *
                       FROM   tbl_fac tf
                              LEFT JOIN tbl_samp ts
                                ON tf.chemical_id = ts.chemical_id
                       WHERE  ts.chemical_id IS NULL
                              AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id
UNION ALL
SELECT tbl_samp.sample_id,
       'ResultsLessThanCriteriaWithMissingChems' AS samplegroup
FROM   tbl_fac
       INNER JOIN tbl_samp
         ON tbl_fac.chemical_id = tbl_samp.chemical_id
WHERE  tbl_samp.result < tbl_fac.criteria
       AND EXISTS (SELECT *
                   FROM   tbl_fac tf
                          LEFT JOIN tbl_samp ts
                            ON tf.chemical_id = ts.chemical_id
                   WHERE  ts.chemical_id IS NULL
                          AND tbl_samp.sample_id = ts.sample_id)
GROUP  BY tbl_samp.sample_id 
于 2012-03-06T07:00:51.713 回答
1
SELECT
    sample_id,
    IF(result = criteria, -1,  /* unspecified behavior */
     IF(result > criteria, 1,
      IF(nb_chemicals = total_nb_chemicals, 2, 3))) AS grp

FROM (
    SELECT s.result, s.sample_id, f.criteria, f.chemical_id,
        COUNT(DISTINCT f.chemical_id) AS nb_chemicals
    FROM tbl_fac f JOIN tbl_samp s
        ON f.chemical_id = s.chemical_id
    GROUP BY s.sample_id
) t 

CROSS JOIN (
    SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
    FROM tbl_fac
) u

新解决方案:

SELECT
    s.sample_id,
    IF(s.result = f.criteria, -1,  /* unspecified behavior */
     IF(s.result > f.criteria, 1,
      IF(sample_nb_chemicals = total_nb_chemicals, 2, 3))) AS grp

FROM
    tbl_fac f JOIN tbl_samp s
    ON f.chemical_id = s.chemical_id

    JOIN (
        SELECT s.sample_id, 
               COUNT(DISTINCT f.chemical_id) AS sample_nb_chemicals
        FROM tbl_fac f JOIN tbl_samp s
             ON f.chemical_id = s.chemical_id
        GROUP BY s.sample_id
    ) u
       ON s.sample_id = u.sample_id

    CROSS JOIN (
        SELECT COUNT(DISTINCT chemical_id) AS total_nb_chemicals
        FROM tbl_fac
    ) v

GROUP BY sample_id, grp
于 2012-03-06T08:51:04.167 回答