0

Using the Data Explorer (SEDE), I would like to find which users have more than 200000 reputation on Stack Overflow, and then find details for any accounts they have on other Stack Exchange sites.

Here is the query which provides the list with this threshold:

Select id, reputation, accountid
From users
Where reputation > 200000

AccountId is the key for all Stack Exchange sites.

I have found this query for aggregating across SEDE databases, but how is it possible to do that based on the dynamic results of the previous/baseline query?

Here is the kind of output I'm aiming for:

id_so, reputation_so, accounted, other_stackexchange_site_name, reputation_othersite, number_of_answers_other_site, number_of_questions_other_site
1, 250000, 23, serverfault, 500, 5, 1
1, 250000, 23, superuser, 120, 1, 0
2, 300000, 21, serverfault, 300, 3, 2
2, 300000, 21, webmasters, 230, 1, 1
3, 350000, 20, NA, NA, NA, NA
#the case with id 3 has an SO profile with reputation but it has no other profile in other Stack Exchange site
4

1 回答 1

1

要基于初始查询跨数据库运行重要查询:

  1. 找出所有数据库中的公共键。在这种情况下,它是AccountId(这是用户的 Stack-Exchange 范围的 ID)。
  2. 创建您的初始查询以将该键输入临时表。在这种情况下:

    CREATE TABLE #UsersOfInterest (AccountId INT)
    INSERT INTO  #UsersOfInterest  
        SELECT  u.AccountId
        FROM    Users u
        Where   u.Reputation > 200000
    
  3. 创建另一个临时表来保存最终结果(见下文)。
  4. 确定要在每个站点上运行的查询,以获取您想要的信息。例如:

    SELECT  u.AccountId, u.DisplayName, u.Reputation, u.Id
            , numQst = (SELECT COUNT(q.Id)  FROM Posts q  WHERE q.OwnerUserId = u.Id  AND q.PostTypeId = 1)
            , numAns = (SELECT COUNT(q.Id)  FROM Posts q  WHERE q.OwnerUserId = u.Id  AND q.PostTypeId = 2)
    FROM    Users u
    WHERE   u.AccountId = ##seAccntId##
    
  5. 使用系统查询来获取适当的数据库。对于数据资源管理器(SEDE),此类型的查询:

    SELECT      name
    FROM        sys.databases
    WHERE       CASE    WHEN state_desc = 'ONLINE'
                        THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U')
                END
                IS NOT NULL
    
  6. 在上面的查询上创建一个游标并使用它来单步执行数据库
    对于每个数据库:

    1. 构建一个查询字符串,将步骤 4 的查询放入步骤 3 的临时表中。
    2. 使用 运行查询字符串sp_executesql
  7. 游标完成后,对步骤 3 中的临时表执行最终查询。


有关查询所有 Stack Exchange 站点的工作模板,请参阅此其他答案。

将它们放在一起,会产生以下查询,您可以在 SEDE 上实时运行:

-- MinMasterSiteRep: User's must have this much rep on whichever site this query is run against
-- MinRep: User's must have this much rep on all other sites

CREATE TABLE #UsersOfInterest (
    AccountId       INT NOT NULL
    , Reputation    INT
    , UserId        INT
    , PRIMARY KEY (AccountId)
)
INSERT INTO  #UsersOfInterest
    SELECT  u.AccountId, u.Reputation, u.Id
    FROM    Users u
    Where   u.Reputation > ##MinMasterSiteRep:INT?200000##

CREATE TABLE #AllSiteResults (
      [Master Rep]          INT
      , [Mstr UsrId]        NVARCHAR(777)
      , AccountId           NVARCHAR(777)
      , [Site name]         NVARCHAR(777)
      , [Username on site]  NVARCHAR(777)
      , [Rep]               INT
      , [# Ans]             INT
      , [# Qst]             INT
)

DECLARE @seDbName       AS NVARCHAR(777)
DECLARE @seSiteURL      AS NVARCHAR(777)
DECLARE @sitePrettyName AS NVARCHAR(777)
DECLARE @seSiteQuery    AS NVARCHAR(max)

DECLARE seSites_crsr CURSOR FOR
WITH dbsAndDomainNames AS (
    SELECT      dbL.dbName
                , STRING_AGG (dbL.domainPieces, '.')    AS siteDomain
    FROM (
        SELECT      TOP 50000   -- Never be that many sites and TOP is needed for order by, below
                    name        AS dbName
                    , value     AS domainPieces
                    , row_number ()  OVER (ORDER BY (SELECT 0)) AS [rowN]
        FROM        sys.databases
        CROSS APPLY STRING_SPLIT (name, '.')
        WHERE       CASE    WHEN state_desc = 'ONLINE'
                            THEN OBJECT_ID (QUOTENAME (name) + '.[dbo].[PostNotices]', 'U') -- Pick a table unique to SE data
                    END
                    IS NOT NULL
        ORDER BY    dbName, [rowN] DESC
    ) AS dbL
    GROUP BY    dbL.dbName
)
SELECT      REPLACE (REPLACE (dadn.dbName, 'StackExchange.', ''), '.', ' ' )  AS [Site Name]
            , dadn.dbName
            , CASE  -- See https://meta.stackexchange.com/q/215071
                    WHEN dadn.dbName = 'StackExchange.Mathoverflow.Meta'
                    THEN 'https://meta.mathoverflow.net/'
                    -- Some AVP/Audio/Video/Sound kerfuffle?
                    WHEN dadn.dbName = 'StackExchange.Audio'
                    THEN 'https://video.stackexchange.com/'
                    -- Ditto
                    WHEN dadn.dbName = 'StackExchange.Audio.Meta'
                    THEN 'https://video.meta.stackexchange.com/'
                    -- Normal site
                    ELSE 'https://' + LOWER (siteDomain) + '.com/'
            END AS siteURL
FROM        dbsAndDomainNames dadn
WHERE       (dadn.dbName = 'StackExchange.Meta'  OR  dadn.dbName NOT LIKE '%Meta%')

-- Step through cursor
OPEN    seSites_crsr
FETCH   NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
WHILE   @@FETCH_STATUS = 0
BEGIN
    SET @seSiteQuery = '
        USE [' + @seDbName + ']

        INSERT INTO #AllSiteResults
            SELECT
                        uoi.Reputation                                                                                  AS [Master Rep]
                        , ''site://u/'' + CAST(uoi.UserId AS NVARCHAR(88)) + ''|'' + CAST(uoi.UserId AS NVARCHAR(88))   AS [Mstr UsrId]
                        , [AccountId] = ''https://stackexchange.com/users/'' + CAST(u.AccountId AS NVARCHAR(88)) + ''?tab=accounts|'' + CAST(u.AccountId AS NVARCHAR(88))
                        , ''' + @sitePrettyName + '''                                                                   AS [Site name]
                        , ''' + @seSiteURL + ''' + ''u/'' + CAST(u.Id AS NVARCHAR(88)) + ''|'' + u.DisplayName          AS [Username on site]
                        , u.Reputation                                                                                  AS [Rep]
                        , (SELECT COUNT(q.Id)  FROM Posts q  WHERE q.OwnerUserId = u.Id  AND q.PostTypeId = 2)          AS [# Ans]
                        , (SELECT COUNT(q.Id)  FROM Posts q  WHERE q.OwnerUserId = u.Id  AND q.PostTypeId = 1)          AS [# Qst]
            FROM        #UsersOfInterest uoi
            INNER JOIN  Users u                ON uoi.AccountId = u.AccountId
            WHERE       u.Reputation > ##MinRep:INT?200##
    '
    EXEC sp_executesql @seSiteQuery

    FETCH NEXT FROM seSites_crsr INTO @sitePrettyName, @seDbName, @seSiteURL
END
CLOSE       seSites_crsr
DEALLOCATE  seSites_crsr

SELECT      *
FROM        #AllSiteResults
ORDER BY    [Master Rep] DESC, AccountId, [Rep] DESC

它给出的结果如下:

查询输出

-- 蓝色值是超链接的。


请注意,用户必须在站点上拥有 200 个代表才能使其“显着”。这也是该站点包含在 Stack Exchange 风格中所需的代表。

于 2018-09-17T06:38:57.093 回答