2

我正在尝试使用 Stack Exchange 数据资源管理器 (SEDE) 找到一种情况,其中 Stack Overflow 上的两个不同用户已经接受了彼此的答案。例如:

Post A { Id: 1, OwnerUserId: "user1", AcceptedAnswerId: "user2" }

Post B { Id: 2, OwnerUserId: "user2", AcceptedAnswerId: "user1" }

我目前有一个查询,可以找到两个作为提问者 - 回答者合作解决问题的用户,但它不能确定这种关系是否是互惠的:

SELECT user1.Id AS User_1, user2.Id AS User_2
FROM Posts p
INNER JOIN Users user1 ON p.OwnerUserId = user1.Id
INNER JOIN Posts p2 ON p.AcceptedAnswerId = p2.Id
INNER JOIN Users user2 ON p2.OwnerUserId = user2.Id
WHERE p.OwnerUserId <> p2.OwnerUserId
AND p.OwnerUserId IS NOT NULL
AND p2.OwnerUserId IS NOT NULL
AND user1.Id <> user2.Id
GROUP BY user1.Id, user2.Id HAVING COUNT(*) > 1;

对于不熟悉架构的任何人,有两个这样的表:

Posts
--------------------------------------
Id                      int
PostTypeId              tinyint
AcceptedAnswerId        int
ParentId                int
CreationDate            datetime
DeletionDate            datetime
Score                   int
ViewCount               int
Body                    nvarchar (max)
OwnerUserId             int
OwnerDisplayName        nvarchar (40)
LastEditorUserId        int
LastEditorDisplayName   nvarchar (40)
LastEditDate            datetime
LastActivityDate        datetime
Title                   nvarchar (250)
Tags                    nvarchar (250)
AnswerCount             int
CommentCount            int
FavoriteCount           int
ClosedDate              datetime
CommunityOwnedDate      datetime

Users
--------------------------------------
Id                      int
Reputation              int
CreationDate            datetime
DisplayName             nvarchar (40)
LastAccessDate          datetime
WebsiteUrl              nvarchar (200)
Location                nvarchar (100)
AboutMe                 nvarchar (max)
Views                   int
UpVotes                 int
DownVotes               int
ProfileImageUrl         nvarchar (200)
EmailHash               varchar (32)
AccountId               int
4

5 回答 5

2

最简单形式的查询(以便查询 16M 问题不会超时)将是:

WITH accepter_acceptee(a, b) AS (
    SELECT q.OwnerUserId, a.OwnerUserId
    FROM Posts AS q
    INNER JOIN Posts AS a ON q.AcceptedAnswerId = a.Id
    WHERE q.PostTypeId = 1 AND q.OwnerUserId <> a.OwnerUserId
), collaborations(a, b, type) AS (
    SELECT a, b, 'a accepter b' FROM accepter_acceptee
    UNION ALL
    SELECT b, a, 'a acceptee b' FROM accepter_acceptee
)
SELECT a, b, COUNT(*) AS [collaboration count]
FROM collaborations
GROUP BY a, b
HAVING COUNT(DISTINCT type) = 2
ORDER BY a, b

结果:

于 2018-09-28T21:37:33.127 回答
1

一个CTE简单inner joins的就可以完成这项工作。不需要我在其他答案中观察到的那么多代码。注意我的很多评论。

链接到StackExchange 数据资源管理器并保存示例结果

with questions as ( -- this is needed so that we have ids of users asking and answering
select
   p1.owneruserid as question_userid
 , p2.owneruserid as answer_userid
 --, p1.id -- to view sample ids
from posts p1
inner join posts p2 on -- to fetch answer post
  p1.acceptedanswerid = p2.id
)
select distinct -- unique pairs
    q1.question_userid as userid1
  , q1.answer_userid as userid2
  --, q1.id, q2.id -- to view sample ids
from questions q1
inner join questions q2 on
      q1.question_userid = q2.answer_userid -- accepted answer from someone
  and q1.answer_userid = q2.question_userid -- who also accepted our answer
  and q1.question_userid <> q1.answer_userid -- and we aren't self-accepting

这带来了一个示例帖子:

虽然,StackExchange 可能会因为大数据集和distinct部分而让您超时。如果您想查看一些数据,请在开始时删除distinct并添加top N

with questions as (
...
)
select top 3 ...
于 2018-09-28T21:47:56.597 回答
1

使用Salman A's answer中的技术,改进了排序并添加了一些更有用的列。

结合我其他答案中的查询,它显示了一些有趣的关系。

在 SEDE 中查看。

WITH QandA_users AS (
    SELECT      q.OwnerUserId   AS userQ
                , a.OwnerUserId AS userA
    FROM        Posts q
    INNER JOIN  Posts a         ON q.AcceptedAnswerId = a.Id
    WHERE       q.PostTypeId    = 1
),
pairsUnion (user1, user2, whoAnswered) AS (
    SELECT  userQ, userA, 'usr 2 answered'
    FROM    QandA_users
    WHERE   userQ <> userA
    UNION ALL
    SELECT  userA, userQ, 'usr 1 answered'
    FROM    QandA_users
    WHERE   userQ <> userA
),
collaborators AS (
    SELECT      user1, user2, COUNT(*) AS [Reciprocations]
    FROM        pairsUnion
    GROUP BY    user1, user2
    HAVING COUNT (DISTINCT whoAnswered) > 1
)
SELECT
            'site://u/' + CAST(c.user1 AS NVARCHAR) + '|Usr ' + u1.DisplayName      AS [User 1]
            , 'site://u/' + CAST(c.user2 AS NVARCHAR) + '|Usr ' + u2.DisplayName    AS [User 2]
            , c.Reciprocations                                                      AS [Reciprocal Accptd posts]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userQ = c.user1)    AS [Usr 1 Qstns wt Accptd]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userQ = c.user1  AND qau.userA = c.user2) AS [Accptd Ansr by Usr 2]
            , (SELECT COUNT(*)  FROM QandA_users qau  WHERE qau.userA = c.user2)    AS [Usr 2 Ttl Accptd Answrs]
FROM        collaborators c
INNER JOIN  Users u1        ON u1.Id = c.user1
INNER JOIN  Users u2        ON u2.Id = c.user2
ORDER BY    c.Reciprocations DESC
            , u1.DisplayName
            , u2.DisplayName

结果如下:

结果

于 2018-09-29T00:12:47.097 回答
0

这就是我的做法。以下是一些简化的数据:

if object_id('tempdb.dbo.#Posts') is not null drop table #Posts
create table #Posts
(
    PostId char(1),
    OwnerUserId int,
    AcceptedAnswerUserId int
)

insert into #Posts
values
('A', 1, 2),
('B', 2, 1),
('C', 2, 3),
('D', 2, 4),
('E', 3, 1),
('F', 4, 1)

出于我们的目的,我们并不真正关心PostId,我们的起点是一组有序的帖子所有者 ( OwnerUserId) 和接受的回答者 ( AcceptedAnswerUserId)。

(虽然没有必要,你可以像这样可视化集合)

select distinct OwnerUserId, AcceptedAnswerUserId
from #Posts

现在我们要查找该集合中所有将这两个字段颠倒的条目。即,如果一个帖子是另一个帖子的接受回答者,则所有者在哪里。因此,如果一对是 (1, 2),我们想要找到 (2, 1)。

我使用左连接执行此操作,因此您可以看到它省略的行,但将其更改为内部连接会将其限制为您描述的集合。您可以随心所欲地收集信息(通过从帽子中挑选任何一列,或者如果您希望它们在单行上,则从其中一个表中返回两列)。

select 
    u1.OwnerUserId, 
    u1.AcceptedAnswerUserId, 
    u2.OwnerUserId, 
    u2.AcceptedAnswerUserId
from #Posts u1
left outer join #Posts u2
    on u1.AcceptedAnswerUserId = u2.OwnerUserId
        and u1.OwnerUserId = u2.AcceptedAnswerUserId

编辑如果要排除自我答案,只需添加and u1.AcceptedAnswerUserId != u1.OwnerUserIdon子句。

就个人而言,我一直觉得有趣的是 SQL 和关系代数在集合论中的根深蒂固,但是在 SQL 中进行这样的基于集合的操作往往会让人感觉非常笨拙。主要是因为为了保持顺序的缺失,您必须在单个列中表示集合成员。但是为了比较 SQL 中的集合成员,您需要将集合成员表示为单独的列。

现在考虑一下,您如何将其扩展到对同一帖子发表评论的三合会用户?

于 2018-09-28T20:44:52.277 回答
0

埃塔:哎呀。误读问题;Op 想要接受的答案,以下是任何互惠的答案。(它很容易修改,但无论如何我对后者更感兴趣。)


由于数据集非常大(并且需要不使 SEDE 超时),我选择限制集合 AMAP 并从那里构建。

所以这个查询:

  1. 如果存在互惠关系,则仅返回任何行。
  2. 返回所有此类问答对。
  3. 不包括自己的答案。
  4. 利用SEDE 的查询参数和魔术列来提高可用性。

在 SEDE 中看到它。

-- UserA: Enter ID of user A
-- UserB: Enter ID of user B
WITH possibleAnswers AS (
    SELECT
                a.Id                AS aId
                , a.ParentId        AS qId
                , a.OwnerUserId   
                , a.CreationDate
    FROM        Posts a
    WHERE       a.PostTypeId        = 2  --  answers
    AND         a.OwnerUserId       IN (##UserA:INT##, ##UserB:INT##)
),
possibleQuestions AS (
    SELECT
                q.Id                AS qId
                , q.OwnerUserId   
                , q.Tags
    FROM        Posts q
    INNER JOIN  possibleAnswers pa  ON q.Id = pa.qId
    WHERE       q.PostTypeId        = 1  --  questions
    AND         q.OwnerUserId       IN (##UserA:INT##, ##UserB:INT##)
    AND         q.OwnerUserId       != pa.OwnerUserId  --  No self answers
)
SELECT 
            pa.OwnerUserId          AS [User Link]
            , 'answers'             AS [Action]
            , pq.OwnerUserId        AS [User Link]
            , pa.CreationDate       AS [at]
            , pq.qId                AS [Post Link]
            , pq.Tags
FROM        possibleQuestions pq
INNER JOIN  possibleAnswers pa      ON pq.qId = pa.qId
WHERE       pq.OwnerUserId          =  ##UserB:INT##
AND         EXISTS (SELECT * FROM possibleQuestions pq2  WHERE pq2.OwnerUserId =  ##UserA:INT##)

UNION ALL SELECT 
            pa.OwnerUserId          AS [User Link]
            , 'answers'             AS [Action]
            , pq.OwnerUserId        AS [User Link]
            , pa.CreationDate       AS [at]
            , pq.qId                AS [Post Link]
            , pq.Tags
FROM        possibleQuestions pq
INNER JOIN  possibleAnswers pa      ON pq.qId = pa.qId
WHERE       pq.OwnerUserId          =  ##UserA:INT##
AND         EXISTS (SELECT * FROM possibleQuestions pq2  WHERE pq2.OwnerUserId =  ##UserB:INT##)

ORDER BY    pa.CreationDate

它会产生类似的结果(点击查看大图):

结果


有关所有此类用户对的列表,请参阅此 SEDE 查询

于 2018-09-28T21:53:09.860 回答