sql - SQL中的单词共现 - 这甚至可能吗？

Question

我有一个看起来像这样的数据集

id | sentence                       | tags
1  | "people walk dogs in the park" | "pet park health"
2  | "I am allergic to dogs"        | "allergies health"

是否可以使用 sql 查询找到每个标记词和每个句子词之间的数量共现？这会很困难，因为您必须解析每个标签和句子条目。

它可能看起来像

select sentence_word,tag_word,count(id)
from
(select id,sentence_word
from table)A

join

(select id, tag_word
from table)B

on A.id=B.id
group by sentence_word,tag_word

除了我知道这两个子查询不正确

以下是一些示例结果

 tag_word   | sentence_word  | count(id)
"walk"      |"pet"           |1
"health"    |"dogs"          |2
"allergies" |"dogs"          |1

score 1 · Accepted Answer

我可以提出以下行动计划：

将两列中的每一列移动到各自的临时数据库中
调用stored procedure（像这个for MySQL）将字符串字段转换为列
CROSS JOIN两个临时表
COUNT DISTINCT在结果数据集上运行

上述步骤可以组合成一个自己的存储过程。

这是一篇关于拆分的文章SQL Server。

在某些 SQL 实现中，拆分可以实现为user defined functions.

sql - SQL中的单词共现 - 这甚至可能吗？

1 回答 1

Related

Reference