1

我有以下表结构:

Tags:
Tag_ID | Name
1      | Tag1
2      | Tag2
3      | Tag3
4      | Tag4
5      | Tag5
6      | Tag6

Posts:
Post_ID | Title | Body
1       | Post1 | Post1
2       | Post2 | Post2
3       | Post3 | Post3
4       | Post4 | Post4
5       | Post5 | Post5
6       | Post6 | Post6
7       | Post7 | Post7
8       | Post8 | Post8
9       | Post9 | Post9
10      | Post10| Post10

TagsPosts:
Tag_ID | Post_ID
1      | 1
1      | 2
1      | 3
1      | 4
1      | 5
1      | 10
1      | 1
2      | 1
2      | 2
2      | 6
2      | 7
3      | 4
3      | 8
3      | 9
4      | 7
5      | 1
5      | 2
5      | 3
5      | 4
5      | 5
5      | 6
5      | 7
6      | 2

我需要从查询中返回的是Posts最常见的前 3 名和其余的Tag前 1名,而不提供任何重复。PostTagsPosts

Desired Output:
Tag_ID | Post_ID
5      | 1
5      | 2
5      | 3
1      | 10
2      | 6
3      | 9
4      | 7

到目前为止,我能够确定Posts最常用的前 3 名Tag

SELECT Top(3) t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID IN (
    SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)

Result:
Tag_ID | Post_ID
5      | 1
5      | 2
5      | 3

我还确定了Post其余Tags使用的前 1 个:

SELECT t.Tag_ID, p.Post_ID FROM Tags as t
INNER JOIN (
    SELECT t.Tag_ID, Max(p.Post_ID) as Post_ID FROM Tags as t
INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
WHERE t.Tag_ID NOT IN (
        SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC)
    AND
p.Post_ID NOT IN (
        SELECT Top(3) p.Post_ID FROM Tags as t
    INNER JOIN TagsPosts as tp ON t.Tag_ID = tp.Tag_ID
    INNER JOIN Posts as p ON tp.Post_ID = p.Post_ID
    WHERE t.Tag_ID IN (
        SELECT TOP(1) Tag_ID FROM TagsPosts GROUP BY Tag_ID ORDER BY COUNT(Tag_ID) DESC))
    GROUP BY t.Tag_ID) as s ON t.Tag_ID = s.Tag_ID
INNER JOIN Posts as p ON s.Post_ID = p.Post_ID

Result:
Tag_ID | Post_ID
1      | 10
2      | 7
3      | 9
4      | 7

这几乎就在那里,但如您所见,它返回 duplicate Posts

顺便说一句,我使用 SQL Server 2008 Express 进行测试,因为我不熟悉 MySQL,但我被要求确定可以应用于 MySQL 数据库的 SQL 查询。我想如果我在 T-SQL 中获得基本查询,那么将其转换为 MySQL 使用的任何 SQL 都会相当简单。

4

1 回答 1

0

我会使用一个窗口函数,将其存储在 CTE 中,然后在谓词中引用它。像这样(使用可以从 SSMS 中按原样运行的数据的简化版本)。您列出了 SQL-Server,但没有列出版本。我相信表函数可以在 SQL Server 2005 及更高版本上运行,但不确定。

declare @Tag table ( tagid int identity, name varchar(8));

insert into @Tag values ('Tag1'),('Tag2'),('Tag3'),('Tag4'),('Tag5'),('Tag6');

declare @Posts table (postid int identity, tagid int, postbody varchar(32));

insert into @Posts values (1,'Blah'),(1, 'Blahblah'),(2, 'Blahblah'),(3, 'Blahbodyblah'),(4, 'Blahblahblah'),(4, 'Blahbodyblah'),(4, 'Blah'),(5, 'Blah'),(5, 'Blahblah'),(6, 'Blahblah');

-- use a CTE
with a as 
    (
    select 
        p.postbody
    ,   count(t.tagid) as TimesTagged
        /* You stated you wanted a return of posts based on their occurrence.  I am counting a position 
        of the COUNTS OF TAGID's descending (greatest first) starting from one.  If you have a tie and want to 
        do those I would consider using DENSE_RANK.  You would have to insert more values where you get a third 
        occurence to become a TIE to see how Rank, Dense_Rank, and Row_number differ.  They all have their 
        purposes but the user should know what they want before determining which they use.
        */
    ,   row_number() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst
    ,   Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_Ranking
    ,   Dense_Rank() over(order by count(t.tagid) desc) as PositionOfCountsTaggedByGreatestOrderFirst_DenseRanking
    from @Tag t 
        join @Posts p on t.tagid = p.tagid
    group by p.postbody
    )
select *
from a
-- I only use Row_Number, you can change to use one of the other predicates above if you wish.
where PositionOfCountsTaggedByGreatestOrderFirst <= 3


/*
You are stating you only want the top three counts
windowed functions are better than using top IMHO as you can specify lists 'in', medians, and all other types
explicitly defined rather than having to repeating nested selects.  The only downer is you can not use 
a predicate on a windowed function directly.  Yout must create it and then in a nested select, CTE (as shown)
, a table variable, temp table, etc...  define a predicate on it.
*/
于 2013-01-27T16:15:57.693 回答