我目前正在编写一个基于已回答问题匹配用户的网络应用程序。我已经在一个查询中实现了我的匹配算法,并对其进行了调整,以至于计算 2 个用户之间的匹配百分比需要 8.2 毫秒。但是我的 webapp 必须获取用户列表并遍历执行此查询的列表。对于 5000 个用户,在我的本地计算机上花费了 50 秒。是否可以将所有内容放在一个查询中,该查询返回一列具有 user_id 的列和一列具有计算匹配的列?还是存储过程是一种选择?
我目前正在使用 MySQL,但如果需要,我愿意切换数据库。
对于任何对架构和数据感兴趣的人,我创建了一个 SQLFiddle:http ://sqlfiddle.com/#!2/84233/1
和我的匹配查询:
SELECT COALESCE(SQRT( (100.0*as1.actual_score/ps1.possible_score) * (100.0*as2.actual_score/ps2.possible_score) ) - (100/ps1.commonquestions), 0) AS perc
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 1) AS as1,
(SELECT SUM(value) AS possible_score, COUNT(*) AS commonquestions
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1) AS ps1,
(SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 1
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 101) AS as2,
(SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 1
WHERE uq1.user_id = 101) AS ps2