我有一些 mysql 表,我想从中提取一些信息,这些表是:
- 视频 - 表示带有分数的视频。
- 标签 - 包含标签的全局列表。
- VideoTags - 创建视频和标签之间的关联。
除了视频资源,我还有图片资源:
- 图片 - 表示带有分数的图片。
- PictureTopic - 创建图片和主题之间的关联。
以及用于视频和图片所有权的用户表
- 用户 - 可以拥有视频和图片
我想要做的是找到每个标签/主题的最高分的视频或图片。有许多具有相同标签/主题的视频和图片,但我的结果集将具有与标签/主题相同的行数。最终目标是为每个唯一标签(标签是带有哈希前缀的主题)列出最佳视频或图片(按点)。
使用上一个问题的解决方案(http://stackoverflow.com/questions/12778329/mysql-data-extraction-from-3-tables-joins-and-max),我可以获得所有最高的视频每个标签的分数。
SELECT SUBSTR(Tags.content,2) as topic_id, Videos.id as resource_id, 'video' as resource_type, Videos.owner_id as resource_owner_id, Videos.points FROM Videos JOIN (
SELECT VideoTags.tag_id, MAX(points) points
FROM Videos JOIN VideoTags ON Videos.id = VideoTags.video_id
GROUP BY VideoTags.tag_id
) t USING (points) JOIN Tags ON t.tag_id = Tags.id and Tags.content LIKE "#%"
我也可以(有点)用这个表达式得到每个主题得分最高的图片:
SELECT PictureTopic.topic_id, Pictures.id as resource_id, 'picture' as resource_type, Pictures.owner_id as resource_owner_id, MAX(points) points
FROM Pictures JOIN PictureTopic ON Pictures.id = PictureTopic.picture_id
GROUP BY PictureTopic.topic_id
我想要的是获得每个标签/主题的最高点的图片或视频,并处理以下边缘情况:
- 如果给定主题有多个图片或视频(即它们具有相同的高分),则遵循资源所有者的分数,如果它们也具有相同的分数(不太可能),那么两个资源都可以在结果中set(除非资源由同一用户拥有,在这种情况下,结果集中应该只有一个结果)。
- 如果视频或图片的点数少于 20,则从结果集中排除该资源。
作为一个经常使用 Grails 的软件开发人员,我喜欢依赖对象关系映射,因此我的 sql 技能很差。到目前为止,我能做的最好的是将两个选择的结果放在一起:
SELECT SUBSTR(Tags.content,2) as topic_id, Videos.id as resource_id, 'video' as resource_type, Videos.owner_id as resource_owner_id, Videos.points FROM Videos JOIN (
SELECT VideoTags.tag_id, MAX(points) points
FROM Videos JOIN VideoTags ON Videos.id = VideoTags.video_id
GROUP BY VideoTags.tag_id
) t USING (points) JOIN Tags ON t.tag_id = Tags.id and Tags.content LIKE "#%"
UNION
SELECT PictureTopic.topic_id, Pictures.id as resource_id, 'picture' as resource_type, Pictures.owner_id as resource_owner_id, MAX(points) points
FROM Pictures JOIN PictureTopic ON Pictures.id = PictureTopic.picture_id
GROUP BY PictureTopic.topic_id
但不幸的是,这甚至没有像预期的那样获得高分图片。可以在 sqlfiddle ( http://sqlfiddle.com/#!2/6650d/1 )上看到
此查询的输出是:
TOPIC_ID RESOURCE_ID RESOURCE_TYPE RESOURCE_OWNER_ID POINTS
topic-1 owner-x-video-a video owner-x 20
topic-2 owner-y-video-m video owner-y 44
topic-1 owner-j-pic-1 picture owner-j 50
topic-3 owner-k-pic-2 picture owner-k 22
但我也希望这一行:
TOPIC_ID RESOURCE_ID RESOURCE_TYPE RESOURCE_OWNER_ID POINTS
topic-3 owner-l-pic-3 picture owner-l 22
在相同高分和分数阈值的边缘情况之后,我想看到:
TOPIC_ID RESOURCE_ID RESOURCE_TYPE RESOURCE_OWNER_ID POINTS
topic-1 owner-j-pic-1 picture owner-j 50
topic-2 owner-y-video-m video owner-y 44
topic-3 owner-l-pic-3 picture owner-l 22
这是供参考的架构和示例数据:
CREATE TABLE `Users` (
`id` VARCHAR(24) NOT NULL DEFAULT '',
`points` DOUBLE NOT NULL DEFAULT 0,
PRIMARY KEY (id)
) Engine=InnoDB;
DROP TABLE IF EXISTS `Videos`;
CREATE TABLE `Videos` (
`id` varchar(24) NOT NULL default '',
`owner_id` varchar(24) NOT NULL default '',
`points` DOUBLE NOT NULL default 0
);
DROP TABLE IF EXISTS `Tags`;
CREATE TABLE `Tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` varchar(32) NOT NULL default ''
PRIMARY KEY (id)
);
DROP TABLE IF EXISTS `VideoTags`;
CREATE TABLE `VideoTags` (
`video_id` varchar(24) NOT NULL default '',
`tag_id` int(11) NOT NULL
);
DROP TABLE IF EXISTS `Pictures`;
CREATE TABLE `Pictures` (
`id` varchar(24) NOT NULL default '',
`owner_id` varchar(24) NOT NULL default '',
`points` DOUBLE NOT NULL default 0
);
DROP TABLE IF EXISTS `PictureTopic`;
CREATE TABLE `PictureTopic` (
`picture_id` varchar(24) NOT NULL,
`topic_id` varchar(31) NOT NULL
);
INSERT INTO Users (id, points) VALUES ('owner-x', 0);
INSERT INTO Users (id, points) VALUES ('owner-y', 0);
INSERT INTO Users (id, points) VALUES ('owner-j', 0);
INSERT INTO Users (id, points) VALUES ('owner-k', 5);
INSERT INTO Users (id, points) VALUES ('owner-l', 14);
INSERT INTO Videos (id,owner_id,points) VALUES
('owner-x-video-a','owner-x', 20),
('owner-x-video-b','owner-x', 15),
('owner-y-video-k','owner-y', 12),
('owner-y-video-l','owner-y', 17),
('owner-y-video-m','owner-y', 44);
INSERT INTO Tags (id, content) VALUES
(111, '#topic-1'),
(222, '#topic-2');
INSERT INTO VideoTags (video_id,tag_id) VALUES
('owner-x-video-a',111),
('owner-x-video-b',111),
('owner-y-video-k',111),
('owner-y-video-l',222),
('owner-y-video-m',222);
INSERT INTO Pictures (id, owner_id, points) VALUES ('owner-j-pic-1','owner-j', 50);
INSERT INTO Pictures (id, owner_id, points) VALUES ('owner-k-pic-2','owner-k', 22);
INSERT INTO Pictures (id, owner_id, points) VALUES ('owner-l-pic-3','owner-l', 22);
INSERT INTO PictureTopic (picture_id, topic_id) VALUES ('owner-j-pic-1','topic-1');
INSERT INTO PictureTopic (picture_id, topic_id) VALUES ('owner-k-pic-2','topic-3');
INSERT INTO PictureTopic (picture_id, topic_id) VALUES ('owner-l-pic-3','topic-3');
有关如何最好地提取此信息的任何指示?干杯:)