sql - 计算不同记录的窗口函数

Question

下面的查询基于一个复杂的视图，该视图按我的意愿工作（我不打算包含该视图，因为我认为它不会帮助解决手头的问题）。我不能正确的是drugCountsinFamilies专栏。我需要它来显示distinct drugName每个药物家族的 s 数量。您可以从第一个屏幕截图中看到有三个不同的 H3A 行。对于drugCountsInFamiliesH3A 应该是 3（有三种不同的 H3A 药物。）

在此处输入图像描述

您可以从第二个屏幕截图中看到，drugCountsInFamilies第一个屏幕截图中的内容是捕获列出药物名称的行数。
在此处输入图像描述

以下是我的问题，对不正确的部分进行了评论

select distinct
     rx.patid
    ,d2.fillDate
    ,d2.scriptEndDate
    ,rx.drugName
    ,rx.drugClass
    --the line directly below is the one that I can't figure out why it's wrong
    ,COUNT(rx.drugClass) over(partition by rx.patid,rx.drugclass,rx.drugname) as drugCountsInFamilies
from 
(
select 
    ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn
    ,d.patid
    ,d.fillDate
    ,d.scriptEndDate
    ,d.uniqueDrugsInTimeFrame
    from DrugsPerTimeFrame as d
)d2
inner join rx on rx.patid = d2.patid
inner join DrugTable as dt on dt.drugClass=rx.drugClass
where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate
and dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
order by rx.patid

如果我尝试在count(rx.drugClass)子句中添加不同的内容，SSMS 会发疯。可以使用窗口函数来完成吗？

score 46 · Accepted Answer

我遇到了这个问题，以寻找解决我计算不同值的问题的方法。在寻找答案时，我遇到了这篇文章。见最后一条评论。我已经对其进行了测试并使用了 SQL。它对我来说非常有效，我想我会在这里提供另一种解决方案。

总之，将DENSE_RANK(), 与PARTITION BY分组列一起使用，并ORDER BY在要计数的列上使用 and ：ASCDESC

DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName ASC) +
DENSE_RANK() OVER (PARTITION BY drugClass ORDER BY drugName DESC) - 1 AS drugCountsInFamilies

我用这个作为自己的模板。

DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields ASC ) +
DENSE_RANK() OVER (PARTITION BY PartitionByFields ORDER BY OrderByFields DESC) - 1 AS DistinctCount

我希望这有帮助！

score 27 · Accepted Answer

将 acount(distinct)作为 Windows 功能执行需要一个技巧。实际上，有几个级别的技巧。

因为您的请求实际上非常简单——该值始终为 1，因为 rx.drugClass 在分区子句中——我将做一个假设。假设您要计算每个患者的独特药物类别的数量。

如果是这样，请row_number()按 patid 和 drugClass 进行分区。当这是 1 时，在一个 patid 内，一个新的 drugClass 开始了。创建一个标志，在这种情况下为 1，在所有其他情况下为 0。

然后，您可以简单地sum使用分区子句来获取不同值的数量。

查询（在格式化它以便我可以阅读之后）看起来像：

select rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
       SUM(IsFirstRowInGroup) over (partition by rx.patid) as NumDrugCount
from (select distinct rx.patid, d2.fillDate, d2.scriptEndDate, rx.drugName, rx.drugClass,
             (case when 1 = ROW_NUMBER() over (partition by rx.drugClass, rx.patid order by (select NULL))
                   then 1 else 0
              end) as IsFirstRowInGroup
      from (select ROW_NUMBER() over(partition by d.patid order by d.patid,d.uniquedrugsintimeframe desc) as rn, 
                   d.patid, d.fillDate, d.scriptEndDate, d.uniqueDrugsInTimeFrame
            from DrugsPerTimeFrame as d
           ) d2 inner join
           rx
           on rx.patid = d2.patid inner join
           DrugTable dt
           on dt.drugClass = rx.drugClass
      where d2.rn=1 and rx.fillDate between d2.fillDate and d2.scriptEndDate and
            dt.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
     ) t
order by patid

score 1 · Accepted Answer

我认为你试图做的是作为一个窗口函数：

COUNT(DISTINCT rx.drugName) over(partition by rx.patid,rx.drugclass) as drugCountsInFamilies

SQL抱怨的。但是你可以这样做：

SELECT 
rx.patid
, rx.drugName
, rx.drugClass
, (SELECT COUNT(DISTINCT rx2.drugName) FROM rx rx2 WHERE rx2.drugClass = rx.DrugClass AND rx2.patid = rx.patid) As drugCountsInFamilies
FROM rx
...

如果表很大，那么最好在其中一列（例如 patid）上放置一个索引，这样嵌套查询就不会消耗大量资源。

score -3 · Accepted Answer

为什么这样的事情不起作用？

SELECT 
   IDCol_1
  ,IDCol_2
  ,Count(*) Over(Partition By IDCol_1, IDCol_2 order by IDCol_1) as numDistinct
FROM Table_1

sql - 计算不同记录的窗口函数

4 回答 4

Related

Reference