0

问题描述

我正在尝试为每个推荐获取一个以逗号分隔的平均成绩列表,其中包含另一个以逗号分隔的推荐内容 ID 列表。推荐是一个对象,它由将接收推荐的内容 ( ContentID) 和将被推荐的其他内容的列表 ( RecommendedContentIDs) 组成。

表结构、样本数据和其他限制

我有一个两表数据库结构。第一个表包含保存为逗号分隔的排名列表的推荐内容 ID。第二个表包含每个推荐内容 ID 的等级。排名列表最多有 10 个逗号分隔值,等级范围从 0 到 5。

为了更好地说明问题,以下是表结构和一些示例数据:

Table Recommendations

|ID    |ContentID    |RecommendedContentIDs |Type |
+------+-------------+----------------------+-----+
|1     |2051         |9706,14801,13354,...  |a    |
+------+-------------+----------------------+-----+
|67    |2051         |8103,16366,8795,...   |b    |
+------+-------------+----------------------+-----+
|133   |2051         |8795,8070,15341,...   |c    |
+------+-------------+----------------------+-----+
|22    |1234         |4782,283,33,...       |a    |
+------+-------------+----------------------+-----+
...

Table Grades

|ID    |RecommendationID |RecommendedDocumentID |Grade |EvaluatorHash|
+------+-----------------+----------------------+------+-------------+
|1     |1                |9706                  |4     |123456789    |
+------+-----------------+----------------------+------+-------------+
|2     |1                |14801                 |5     |123456789    |
+------+-----------------+----------------------+------+-------------+
|3     |1                |13354                 |3     |987654321    |
+------+-----------------+----------------------+------+-------------+
|3     |1                |9706                  |3     |987654321    |
+------+-----------------+----------------------+------+-------------+
|4     |67               |8103                  |5     |123456789    |
+------+-----------------+----------------------+------+-------------+
|1     |67               |16366                 |4     |987654321    |
+------+-----------------+----------------------+------+-------------+
|1     |133              |8795                  |2     |123456789    |
+------+-----------------+----------------------+------+-------------+
...

我已将表 Recommendations 中的 RecommendedContentIDs 列转换为一个单独的表,如下所示:

Table RecommendedContent

|ID    |RecommendationID |RecommendedContentID |Rank |
+------+-----------------+---------------------+-----+
|1     |1                |9706                 |1    |
+------+-----------------+---------------------+-----+
|2     |1                |14801                |2    |
+------+-----------------+---------------------+-----+
|3     |1                |13354                |3    |
+------+-----------------+---------------------+-----+
|4     |1                |12787                |4    |
+------+-----------------+---------------------+-----+
...

+------+-----------------+---------------------+-----+
|11    |2                |19042                |1    |
+------+-----------------+---------------------+-----+
|12    |2                |13376                |2    |
+------+-----------------+---------------------+-----+
|13    |2                |9853                 |3    |
+------+-----------------+---------------------+-----+

预期结果

我现在想进行一个查询,该查询将返回一个结果集,该结果集包含两个对应的逗号分隔列表,以便我能够显示每个推荐内容 ID 的平均成绩。它应该看起来像这样:

|ContentID    |RecommendedContentIDs    |RecommendedContentAverageGrades   |Type  |
+-------------+-------------------------+----------------------------------+------+
|2051         |9706,14801,13354,...     |3.5,5.0,3.0,...                   |a     |
+-------------+-------------------------+----------------------------------+------+
|2051         |8103,16366,8795,...      |5.0,4.0,0.0,...                   |b     |
+-------------+-------------------------+----------------------------------+------+
|2051         |8795,8070,15341,...      |2.0,0.0,0.0,...                   |c     |
+-------------+-------------------------+----------------------------------+------+
...

如您所见,该RecommendedContentAverageGrades列包含该列中每个对应 ContentID的平均RecommendedContentIDs评分(ID 为 9706 的内容被评分了两次,一次为 4,一次为 3,因此平均值为 3.5)。如果内容没有被评分,平均评分应该是0。这里真正重要的是两个逗号分隔的列表是对应的,因为列表中的列表RecommendedContentIDs排名列表。

我通常会在 C# 中实现类似的东西,但我想知道它是否可以用 SQL 来完成。我正在考虑使用GROUP_CONCAT,但我无法获得正确的结果集。如果有人能为 MySQL 和/或 T-SQL 提供一个有效的 SQL 查询,我将非常感激,但只是建议也可以。

编辑

#1 - Laurence 提到使用单独的表格而不是逗号分隔的列表。由于旧设计,我正在使用它们,我无法更改。但是,我愿意接受假设逗号分隔列表中的数据存储在单独表中的答案。

#2 - 像 Laurence 建议的那样改变结构(使用分隔表 - 请参阅更新的结构)。

4

4 回答 4

3

这只是跟进@Laurence给出的答案:

http://sqlfiddle.com/#!2/7d236/6

于 2012-11-16T13:15:23.000 回答
2

更新了 Akrigg 的修复和 sql fiddle,以及如何按推荐表中的值排序还根据 brozo 的修复在 group_concat 子句中使用 order by 进行了更新:

Table RecommendedContent

+-----------------+----------------------+
|RecommendationID | RecommendedContentID |
+-----------------+----------------------+
| 1               | 9706                 |
| 1               | 14801                |
| 1               | 13354                |
| 67              | 8103                 |
| ...             | ...                  |
+-----------------+----------------------+

Select
  a.RecommendationID,
  a.ContentID,
  Group_Concat(a.RecommendedContentId Order By a.Rank),
  Group_Concat(Trim(Trailing '.' From Trim(Trailing '0' From a.AverageGrade)) Order By a.Rank),
  a.Type
From (
  Select
    r.RecommendationID,
    r.ContentID,
    r.Type,
    rc.RecommendedContentID,
    rc.Rank,
    Coalesce(Avg(g.Grade), 0) As AverageGrade
  From
    Recommendations r
      Left Outer Join
    RecommendedContent rc
      On r.RecommendationID = rc.RecommendationID
      Left Outer Join
    Grades g
      On rc.RecommendedContentID = g.RecommendedDocumentID And
         rc.RecommendationID = g.RecommendationID
  Group By
    r.RecommendationID,
    r.ContentID,
    r.Type,
    rc.RecommendedContentID,
    rc.Rank
  ) as a
Group By
  a.RecommendationID,
  a.ContentID,
  a.Type
Order By
  a.ContentID, -- Or other way round if that's what you prefer
  a.RecommendationID

http://sqlfiddle.com/#!2/ca8b8/8

于 2012-11-16T12:08:52.183 回答
1

这是在 oracle 中完成的

WITH count_number AS
  (SELECT 
    ContentID,
    ','
    ||RecommendedContentIDs
    ||',' new_ContentIDs,
    RecommendedContentIDs,
    type ,
    LENGTH(RECOMMENDEDCONTENTIDS )-LENGTH(REPLACE(RECOMMENDEDCONTENTIDS ,','))+1 COUNT_ID
  FROM Recommendations
  ) ,
  RecommendedContentIDs_postion AS
  (SELECT A1.*,
    B1.CONTENTIDS_OCCURANCE_POSITION ,
    SUBSTR(new_ContentIDs,instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)+1 , INSTR(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION+1)-instr(new_ContentIDs,',',1,ContentIDs_OCCURANCE_POSITION)-1) ContentIDs
  FROM count_number a1,
    (SELECT I ContentIDs_OCCURANCE_POSITION
    FROM DUAL model dimension BY (1 i) measures (0 X) (X[FOR I
    FROM 2 TO 1000 increment 1] = 0)
    ) b1
  WHERE b1.ContentIDs_OCCURANCE_POSITION<=a1.count_id
  )
SELECT 
  CONTENTID,
  WM_CONCAT(CONTENTIDS) RECOMMENDEDCONTENTIDS ,
  WM_CONCAT(GRADE) avg_grade_contentid ,
  type
FROM RECOMMENDEDCONTENTIDS_POSTION RCI,
  (SELECT RECOMMENDEDDOCUMENTID,
    AVG(GRADE) GRADE
  FROM Grades
  GROUP BY RECOMMENDEDDOCUMENTID
  ) GRD
WHERE TRIM(RCI.CONTENTIDS)=TRIM(GRD.RECOMMENDEDDOCUMENTID)
GROUP BY 
  ContentID,
  type;
于 2012-11-16T14:20:25.330 回答
1

您可以在 sql server 中创建一个自定义聚合来执行逗号分隔的字符串连接,然后像这样使用它:

SELECT ContentID, RecommendedContentIDs, CustomToCsv(AvgGrade), Type FROM
(
    SELECT ContentID, RecommendedContentIDs, AVG(Grade) AvgGrade, Type 
    FROM Recommendations r INNER JOIN  Grades g ON r.ID = g.RecommendationID
    GROUP BY ContentID, RecommendedContentIDs, RecommendedDocumentID, Type
) as t
GROUP BY ContentID, RecommendedContentIDs, Type
于 2012-11-16T12:09:36.347 回答