6

我有以下数据库表,其中包含有关人员、疾病和药物的信息:

PERSON_T              DISEASE_T               DRUG_T
=========             ==========              ========
PERSON_ID             DISEASE_ID              DRUG_ID
GENDER                PERSON_ID               PERSON_ID
NAME                  DISEASE_START_DATE      DRUG_START_DATE
                      DISEASE_END_DATE        DRUG_END_DATE

从这些表格中,我运行了一些关于哪些人服用了哪些药物并患有哪些疾病的统计数据。从中我可以找出哪些模式对我来说很有趣,可以进一步深入研究。例如,下面是我可能为疾病 52 找到的布尔模式的简化示例:

( (Drug 234 = false AND Drug 474 = true AND Drug 26 = false) OR 
  (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

编辑: 这是另一个例子:

( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR 
      (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )

现在我想把这个模式转换成一个 sql 查询,找到所有匹配这个模式的人。
例如,我想查找 PERSON_T 中所有患有这种疾病的人以及((在出现症状之前未服用药物 234 和 26,但在出现症状之前服用药物 474)或(在出现症状之前服用药物 371) ,但在出现症状之前不是药物 791 和 395))

我将如何将此模式翻译回原始查询?

这是我的第一次尝试,但我被困在第一个学期:

SELECT * FROM PERSON_T, DRUG_T, DISEASE_T 
  WHERE DISEASE_ID = 52 AND 
    PERSON_T.PERSON_ID = DISEASE_T.PERSON_ID AND 
    PERSON_T.PERSON_ID = DRUG_T.PERSON_ID  AND 
    (DRUG_T.DRUG_ID=234 AND (DRUG_T.DRUG_START_DATE>DISEASE_T.END_DATE || ???)

我需要这个在 PostgreSql 中工作,但我假设任何给定的答案都可以从给定的数据库转换为 PostgreSql。

对评论的回应

  1. 我修复了数据库表的格式。谢谢你。
  2. 我需要能够采用任意布尔语句并将其转换为 SQL。我们实际创建的布尔语句比我给出的示例要长得多。我创建的任何新表都将位于新数据库中,并且需要与原始表具有相同的架构。这样对最终用户来说,他们可以在新表上运行相同的代码,并且它的工作方式与在原始表上运行一样。这是客户的要求。我希望我可以创建一个视图,它只是对原始表的查询。如果我们不能让它工作,我可以创建表的副本并在将数据复制到新表时过滤数据。我们没有使用神经网络进行分析。我们正在使用我们自己的自定义算法,其扩展性比神经网络好得多。
  3. Disease_Start_Date 是人得病的日期,很可能是症状开始出现的时候。疾病结束日期是患者康复的时间,很可能是症状消失的时间。
  4. Drug_start_date 是该人开始服药的时间。Drug_end_date 是该人停止服药的时间。

编辑 我添加了自己的答案。谁能想出一个更简单的答案?

4

10 回答 10

4

对我来说,直接(如果丑陋)的解决方案是使用 EXISTS 和 NOT EXISTS 子句:

SELECT *
FROM PERSON_T INNER JOIN DISEASE_T
     USING (PERSON_ID)
WHERE DISEASE_ID = 52
  AND EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 474
                AND [time condition])
  AND NOT EXISTS (SELECT 1 FROM DRUG_T
              WHERE DRUG_T.PERSON_ID = PERSON_T.PERSON_ID
                AND DRUG_ID = 234
                AND [time condition])

...等等。在这个例子中,我们询问的是服用了 474 药物但没有服用 234 药物的人。显然,您可以根据需要使用 AND 和 OR 对子句进行分组。

旁白:我发现所有大写字母都难以阅读。我通常使用大写的 SQL 关键字和小写的表名和列名。

于 2010-07-09T18:54:47.073 回答
1

我不知道这将如何处理大型表(我想这会很糟糕,因为日期比较通常非常昂贵),但这是一种应该有效的方法。它相对冗长,但很容易针对不同的布尔值进行修改。

示例 1:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NULL OR dis.startdate < d1.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     OR
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))

示例 2:

SELECT dis.*
FROM disease_t dis
LEFT JOIN drug d1 ON d1.person_id = dis.person_id AND d1.drug_id = 234
LEFT JOIN drug d2 ON d2.person_id = dis.person_id AND d2.drug_id = 474
LEFT JOIN drug d3 ON d3.person_id = dis.person_id AND d3.drug_id = 26
LEFT JOIN drug d4 ON d4.person_id = dis.person_id AND d4.drug_id = 395
LEFT JOIN drug d5 ON d5.person_id = dis.person_id AND d5.drug_id = 791
LEFT JOIN drug d6 ON d6.person_id = dis.person_id AND d6.drug_id = 371
WHERE dis.disease_id = 52
AND (((d1.person_id IS NOT NULL AND d1.startdate < dis.startdate) AND
      (d2.person_id IS NOT NULL AND d2.startdate < dis.startdate) AND
      (d3.person_id IS NULL OR dis.startdate < d3.startdate)) 
     or
     ((d4.person_id IS NULL OR dis.startdate < d4.startdate) AND
      (d5.person_id IS NULL OR dis.startdate < d5.startdate) AND
      (d6.person_id IS NOT NULL AND d6.startdate < dis.startdate)))
于 2010-07-23T18:57:12.023 回答
1

( (Drug 234 = true AND Drug 474 = true AND Drug 26 = false) OR (Drug 395 = false AND Drug 791 = false AND Drug 371 = true) )正如您发布的那样,这是一个处理 的查询。

/*
-- AS DEFINED BY JOINS
-- All "person_id"'s match
-- Drug 1 is not Drug 2
-- Drug 1 is not Drug 3
-- Drug 2 is not Drug 3
-- All Drugs are optional as far as the SELECT statement is concerned (left join)
   -- Drug IDs will be defined in the WHERE clause
-- All Diseases for "person_id"

-- AS DEFINED IN WHERE STATEMENT
-- Disease IS 52
-- AND ONE OF THE FOLLOWING:
--   1) Disease started AFTER Drug 1
--      Disease started AFTER Drug 2
--      Drug 1 IS 234
--      Drug 2 IS 474
--      Drug 3 IS NOT 26 (AND NOT 234 or 474, as defined in JOINs)
--   2) Disease started AFTER Drug 3
--      Drug 1 IS NOT 395
--      Drug 2 IS NOT 791
--      Drug 3 IS 371
*/

SELECT p.person_id, p.gender FROM person_t as p
LEFT JOIN drug_t    AS dr1 ON (p.person_id = dr1.person_id)
LEFT JOIN drug_t    AS dr2 ON (p.person_id = dr2.person_id AND dr1.drug_id != dr2.drug_id)
LEFT JOIN drug_t    AS dr3 ON (p.person_id = dr3.person_id AND dr1.drug_id != dr3.drug_id AND dr2.drug_id != dr3.drug_id)
JOIN      disease_t AS ds  ON (p.person_id = ds.person_id)
WHERE ds.disease_id = 52
AND (   (    (dr1.drug_start_date < ds.disease_start_date AND dr2.drug_start_date < ds.disease_start_date)
        AND (dr1.drug_id = 234 AND dr2.drug_id = 474 AND dr3.drug_id != 26)
        )
    OR
        (    (dr3.drug_start_date < ds.disease_start_date)
        AND (dr1.drug_id != 395 AND dr2.drug_id != 791 AND dr3.drug_id = 371)
        )
    )
于 2010-07-23T19:37:53.990 回答
0

请原谅任何错误,但我认为这样的事情会起作用(在 T-SQL 中):

SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(234, 26)
AND drug_id = 474
AND disease_start_date < drug_start_date
UNION
SELECT col1, col2, col3...
FROM PERSON_T AS P, DRUG_T AS DR, DISEASE_T AS DI
WHERE disease_id = 52
AND P.person_id = DI.person_id
AND P.person_id = DR.person_id
AND drug_id NOT IN(791, 395)
AND drug_id = 371
AND disease_start_date < drug_start_date

现在不必使用 UNION 来完成,但为了便于阅读,我认为考虑到您的条件,这是最简单的。也许这会引导你走向正确的方向。

于 2010-07-09T14:40:34.463 回答
0
SELECT per.person_id, per.name, per.gender
FROM person_t per
INNER JOIN disease_t dis
USING (person_id)
INNER JOIN drug_t drug
USING (person_id)
WHERE dis.disease_id = 52 AND drug.drug_start_date < dis.disease_start_date AND ((drug.drug_id IN (234, 474) AND drug.drug_id NOT IN (26)) OR (drug.drug_id IN (371) AND drug.drug_id NOT IN (395, 791)));

这将满足您的要求。最后的 IN 语句非常不言自明。

于 2010-07-09T14:53:15.677 回答
0

给出的答案似乎都不起作用。这里是我想要实现的模式:(((药物 234 = 真 AND 药物 474 = 真和药物 26 = 假)或(药物 395 = 假和药物 791 = 假和药物 371 = 真))

我相信以下查询适用于(Drug 234 = true AND Drug 474 = true AND Drug 26 = false)。鉴于此,添加查询的后半部分非常容易。

SELECT  p.person_id, p.gender FROM person_t as p 
    join drug_t as dr on dr.person_id = p.person_id 
    join disease_t as ds on ds.person_id=p.person_id 
    WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=234
INTERSECT
SELECT  p.person_id, p.gender FROM person_t as p 
    join drug_t as dr on dr.person_id = p.person_id 
    join disease_t as ds on ds.person_id=p.person_id 
    WHERE dr.drug_start_date < ds.disease_start_date AND disease_id = 52 AND dr.drug_id=474
INTERSECT (
SELECT p.person_id, p.gender
    FROM person_t as p 
    JOIN disease_t as ds on ds.person_id = p.person_id 
    LEFT JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
    WHERE disease_id = 52 AND dr.person_id is null 
UNION 
SELECT p.person_id, p.gender
    FROM person_t as p 
    JOIN disease_t as ds on ds.person_id = p.person_id 
    JOIN drug_t as dr ON dr.person_id = p.person_id  AND dr.drug_id = 26
    WHERE disease_id = 52 AND dr.drug_start_date > ds.disease_start_date)

此查询有效,但非常难看。我还怀疑,一旦我拥有一个拥有 1 亿人的生产数据库,它会非常慢。我能做些什么来简化/优化这个查询吗?

于 2010-07-23T17:56:03.007 回答
0

我没有真正方便的测试数据来尝试这个,但我认为你可以这样做:

SELECT *
FROM DISEASE_T D
INNER JOIN DRUG_T DR ON D.PERSON_ID = DR.PERSON_ID AND D.DRUG_ID=52
INNER JOIN PERSON_T P ON P.PERSON_ID = D.PERSON_ID
GROUP BY PERSON_ID
HAVING SUM(
    CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
    WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    ELSE 0 END) = 1
    OR
    SUM(
    CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN -1 
    WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 
    ELSE 0 END) = 1

如果您在药物/疾病表中有同一个人和同一药物/疾病的多条记录,我知道的情况将失败。如果是这种情况,您还可以将 HAVING 子句更改为:

(SUM(CASE WHEN DRUG_ID=234 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=474 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0
AND SUM(CASE WHEN DRUG_ID=26 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0)
OR
(SUM(CASE WHEN DRUG_ID=395 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=791 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN DRUG_ID=371 AND DRUG_START_DATE<DISEASE_START_DATE THEN 1 ELSE 0 END) > 0)
于 2010-07-23T20:07:51.457 回答
0

我可能会从与此类似的某个方向来解决这个问题。它非常灵活。

DRUG_DISEASE_CORRELATION_QUERY
===============================
DRUG_DISEASE_CORRELATION_QUERY_ID
DISEASE_ID
DESCRIPTION

(1, 52, 'What this query does.')
(2, 52, 'Add some more results.')

DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG
===========================================
DRUG_DISEASE_CORRELATION_QUERY_ID
DRUG_ID

(1, 234)
(1, 474)
(2, 371)

DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG
===========================================
DRUG_DISEASE_CORRELATION_QUERY_ID
DRUG_ID

(1, 26)
(2, 395)
(2, 791)



CREATE VIEW DRUG_DISEASE_CORRELATION
AS
SELECT 
    p.*,
    q.DRUG_DISEASE_CORRELATION_QUERY_ID
FROM 
    DRUG_DISEASE_CORRELATION_QUERY q
    INNER JOIN DISEASE_T ds on ds.DISEASE_ID = q.DISEASE_ID
    INNER JOIN PERSON_T p ON p.PERSON_ID = ds.PERSON_ID
  WHERE 
    AND EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
        (SELECT qid.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_INCLUDE_DRUG qid WHERE 
        qid.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
        AND DRUG_START_DATE < ds.DISEASE_START_DATE)
   AND NOT EXISTS (SELECT * FROM DRUG_T dr WHERE dr.PERSON_ID = p.PERSON_ID AND dr.DRUG_ID IN
        (SELECT qed.DRUG_ID FROM DRUG_DISEASE_CORRELATION_QUERY_EXCLUDE_DRUG qed WHERE 
        qed.DRUG_DISEASE_CORRELATION_QUERY_ID = q.DRUG_DISEASE_CORRELATION_QUERY_ID)
        AND DRUG_START_DATE < ds.DISEASE_START_DATE)
GO


SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 1
UNION
SELECT * FROM DRUG_DISEASE_CORRELATION WHERE DRUG_DISEASE_CORRELATION_QUERY_ID = 2
于 2010-07-23T20:22:45.583 回答
0

如果我说得对,你想:

  • 选择那些人
  • 曾感染一 (1) 种特定疾病的人
  • 接受过一种或多种特定药物治疗的人
  • 未接受过一种或多种指定其他药物治疗的人

这可以通过将您的“药物需求”转换为某种形式的临时表来简化。这将允许使用任意数量的“好”和“坏”药物进行查询。我在下面的内容可以作为存储过程实现,但如果这不是一个选项,则可以使用许多复杂的选项。

分解步骤:

首先,这是选择所需患者的方式。稍后我们将使用它作为子查询:

SELECT [PersonData]
 from DISEASE_T di
  inner join PERSON_T pe
   on pe.Person_Id = di.Person_Id
 where di.Disease_Id = [TargetDisease]
  and [TimeConstraints]

其次,对于你用 AND 运算的每一组“目标”药物,建立一个临时表,如下所示(这是 SQL Server 语法,Postgres 应该有类似的东西):

CREATE TABLE #DrugSet
 (
   Drug_Id  [KeyDataType]
  ,Include  int   not null
 )

为您正在考虑的每种药物填充一行:

  • Drug_Id = 您正在检查的药物
  • 包括 = 1 如果该人已服用该药物,0 如果该人未服用该药物

并计算两个值:

@GoodDrugs,您希望患者服用
的药物数量 @BadDrugs,您希望患者未服用的药物数量

现在,在以下查询中将以上所有内容拼接在一起:

SELECT pe.[PersonData]  --  All the desired columns from PERSON_T and elsewhere
 from DRUG_T dr
  --  Filter to only include "persons of interest"
  inner join (select [PersonData]
               from DISEASE_T di
                inner join PERSON_T pe
                 on pe.Person_Id = di.Person_Id
               where di.Disease_Id = [TargetDisease]
                and [TimeConstraints]) pe
   on pe.Person_Id = dr.Person_ID
 --  Join with any of the drugs we are intersted in
 left outer join #DrugSet ta  
  on ta.Drug_Id = dr.Drug_Id
 group by pe.[PersonData]  --  Same as in the SELECT clause
 having sum(case ta.Include
              when 1 then 1  --  This patient has been given a drug that we're looking to match
              else 0         --  This patient has not been given this drug (catches NULLs, too)
            end) = @GoodDrugs
  and  sum(case ta.Include
              when 0 then 1  --  This patient has been given this drug that we're NOT looking to match
              else 0         --  This patient has not been given this drug (catches NULLs, too)
            end) = @BadDrugs

我故意忽略了时间标准,因为您没有详细说明它们,但添加它们应该相当简单(尽管我希望这不是著名的遗言)。进一步的优化可能是可能的,但很大程度上取决于数据和其他可能的标准。

您需要为每个“药物组”(即,真或假药物组合在一起)运行一次,将列表与每次传递连接起来。您可能可以扩展#DrugSet 以考虑您正在检查的每个药物组,但我不愿意在没有一些严肃的数据来测试它的情况下尝试编写代码。

*/

于 2010-07-23T20:34:15.507 回答
0

我试图解决这个问题,并尽可能合乎逻辑地进行下去。

首先,三个表(Person_T、Drugs_T、Disease_T)可以认为如图 1.0 所示:

一个人可以有多种药物和多种疾病。每种药物和疾病都有开始日期和结束日期。

因此,我首先将三个表反规范化为一个表(Table_dn),因此:

dnId | PersonId | DrugId | DiseaseId | DgSt | DgEn | DiSt | DiEn
----   --------   ------   ---------   ----   ----   ----   ----

如果需要,这个去规范化的表可以是一个临时表,不管 Table_dn 现在包含所有整个全局数据集,如图 2.0 所示(表示为 G)。

根据我对您描述的理解,我可以看到本质上是一个两层过滤器。

过滤器 1

正如您在问题描述中已经说明的那样,此过滤器只是一组布尔值的药物组合。例如:

(drug a = 1 & drug b = 0 & etc) OR (.....

过滤器 2

这个过滤器比第一个复杂一点,它是日期范围标准。图 3.0 用RED显示了这个日期范围。黄色代表以多种方式跨越的记录日期:

  • RED时期之前
  • RED 期过后
  • RED时期之间
  • 在 RED 期结束前结束
  • RED 期开始后开始

现在黄色日期期间可能是药物期间或疾病期间或两者的组合。

此过滤器应应用于从第一个结果中获得的结果集。

当然,根据您的确切问题,这两个过滤器可能需要反过来(例如先 f2 然后 f1)。

SQL伪代码:

Select sub.*
From    
      (select    * 
       from      Table_dn 
       where     [Filter 1]
      ) as sub

where [Filter 2]

替代文字

于 2010-07-23T22:48:17.687 回答