假设D
是一个文本文档,并且
K = < k1, ..., kN >
表示文档中包含的一组术语。例如:
D = "What a wonderful day, isn't it?"
K = <"wonderful","day">
我的目标是查看文档是否将D
所有单词K
作为一个整体进行讨论。例如:
D = "The Ebola in Africa is spreading at high speed"
K = <"Ebola","Africa">
是D
与 密切相关的情况K
,而:
D = "NEWS 1: Ebola is a dangerous disease that is causing thousands of deaths. Many governments are taking precautions to prevent its spread. NEWS 2: population in Africa is increasing."
K = <"Ebola","Africa">
是一个D
与 无关的案例K
,因为“Ebola”和“Africa”在文件的不同点、分开的句子中被提及,并且不相关。
我如何综合 to 的“相关性”这个D
概念K
?现有技术中是否有一些可以利用的技术?
谢谢。