1

我有一个非常简单的表 (LOG),其中包含属性 MAC_ADDR、IP_SRC、IP_DST、URL、PROTOCOL。当 PROTOCOL='DNS' 时,我希望包含 IP_SRC、URL、#OfOccurrences 的前n行通过减少表中每个 IP_SRC 的 #OfOccurrences 来排序。

为了更清楚,我希望能够为我的表中的每个 IP_SRC 列出前n 个访问次数最多的页面。

我可以像这样获得每个 IP_SRC 的访问量最大的 URL:

select ip_src,url,cnt
from (
    select ip_src,url,count(*) as cnt,protocol
    from log as b group by ip_src,url order by ip_src,cnt desc
) as c
where cnt>=(select MAX(cpt)
            from (select count(*) as cpt from log as b
            where c.ip_src==b.ip_src group by ip_src,url)
           )
      and protocol='DNS';

但是,这个解决方案显然没有优化。

这是一个更实用的代码(对于每个 IP_SRC 的访问量最大的 URL):

select ip_src,url,cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src,cnt asc)
group by ip_src;

第二个选项更快!但是,我想要每个 IP_SRC 的n 个访问次数最多的页面,我不知道该怎么做。

谢谢你的帮助。

4

3 回答 3

1

使用公用表表达式

WITH Temp1 AS (
  SELECT ip_src, url, count(*) AS cnt
  FROM Log
  WHERE protocol = 'DNS'
  GROUP BY ip_src, url
)
SELECT ip_src, url, cnt
FROM Temp1 AS T1
WHERE url IN (
  SELECT url
  FROM Temp1 AS T2
  WHERE T2.ip_src = T1.ip_src
    AND T2.cnt >= T1.cnt
  ORDER BY cnt DESC
  LIMIT 3  -- or whatever you want it to be
)
ORDER BY ip_src ASC, cnt DESC;
于 2016-09-13T17:05:49.200 回答
0
select x.ip_src, x.url, x.cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src, count(*) desc) AS x
group by x.ip_src;

你能试试这个吗?

于 2016-09-13T11:37:17.813 回答
0

最后,通过使用临时表,我可以设法得到我想要的。

--First create a temp table of occurences
CREATE TEMPORARY TABLE TEMP1 AS
SELECT ip_src,url,count(*) AS cnt
FROM LOG
WHERE protocol='DNS'
GROUP BY ip_src,url
ORDER BY ip_src,cnt,url DESC;
--Then use a classic limit per group query
SELECT T1.ip_src,T1.url,T1.cnt
FROM TEMP1 AS T1
WHERE T1.url in (
      SELECT T2.url
      FROM TEMP1 AS T2
      WHERE T2.ip_src=T1.ip_src and T2.cnt>=T1.cnt
      ORDER BY T2.cnt DESC
      LIMIT 3 --Or whatever you want it to be
)
ORDER BY T1.ip_src ASC,T1.cnt DESC;

如果有人知道如何在不需要临时表的情况下做同样的事情(或者向我解释为什么临时表是一个好的解决方案),请表达自己。

于 2016-09-13T16:13:44.397 回答