我遇到了 MySQL 问题,我需要啤酒午餐。我想做这样的查询:
SELECT MATCH(some_string) AGAINST ('beer lunch') FROM (SELECT GROUP_CONCAT(some_column) AS some_string FROM myrealtable) AS mytablealias;
不幸的是,我发现我无法对 GROUP_CONCAT 列执行 FULLTEXT MATCH,因为 FULLTEXT 索引仅存在于原始列 (some_column) 中,而不存在于别名表的连接列 (some_string) 中。
我真的需要进行 FULLTEXT 搜索并为在我的表中的多行中断开的连接字符串生成相关性分数。
这是我为研究相关性问题而进行的一个小思想实验。让我们从一个包含连接字符串的表开始:
+----------+-------------------------------------------------------------------------------------+
| table_id | concat_string |
+----------+-------------------------------------------------------------------------------------+
| 1 | I like beer Beer is a healthy choice My brother drinks beer for lunch every day |
| 2 | I like juice Juice is a healthier choice My brother drinks beer for lunch every day |
+----------+-------------------------------------------------------------------------------------+
现在我对此表执行以下 MATCH 查询:SELECT table_id,MATCH(concat_string) AGAINST('beer lunch') AS score FROM myconcattable;
我得到以下相关性分数:
+----------+----------------------------+
| table_id | score |
+----------+----------------------------+
| 1 | 0.000000007543713209656744 |
| 2 | 0.000000003771856604828372 |
+----------+----------------------------+
显然,在搜索“啤酒午餐”时,第一行比第二行更相关......但问题是我的字符串在需要根据外键(foreign_id)分组的多行中被打破。这是我的桌子的真实样子:
+----------+--------------------------------------------+------------+
| table_id | some_string | foreign_id |
+----------+--------------------------------------------+------------+
| 1 | I like beer | 1 |
| 2 | Beer is a healthy choice | 1 |
| 3 | My brother drinks beer for lunch every day | 1 |
| 4 | I like juice | 2 |
| 5 | Juice is a healthier choice | 2 |
| 6 | My brother drinks beer for lunch every day | 2 |
+----------+--------------------------------------------+------------+
所以现在让我们试试SELECT table_id,MATCH(some_string) AGAINST('beer lunch') AS score, foreign_id FROM mybrokentable;
这个表上的查询( ):
+----------+----------------------+------------+
| table_id | score | foreign_id |
+----------+----------------------+------------+
| 1 | 0.031008131802082062 | 1 |
| 2 | 0.031008131802082062 | 1 |
| 3 | 0.25865283608436584 | 1 |
| 4 | 0 | 2 |
| 5 | 0 | 2 |
| 6 | 0.25865283608436584 | 2 |
+----------+----------------------+------------+
好的,所以如果我将分数加起来,那么foreign_id 1 看起来比foreign_id 2 更相关,但与将字符串连接到一个表中时相比,它不是很准确。
理想情况下,我想设计一个查询,为外国 id 生成相关性分数,如下所示:
+----------------------------+------------+
| score | foreign_id |
+----------------------------+------------+
| 0.000000007543713209656744 | 1 |
| 0.000000003771856604828372 | 2 |
+----------------------------+------------+
关于我应该做什么的任何想法?