1

I have a 140 characters texts and a set of keywords. What I want to do is to write an algorithm that will help me compute a percentage matching between my text and keywords in order to qualify a text as repesenting an IT event annonciation.

For example: Text: "Tomorrow will take place our weekly event which about computer. We will discuss about how to implement algorithms. This will be very great." keyword: "event, computer, database, Software, algorithms"

Here the matching is 3 words over 5 keywords which is 60%

Does that make sense, using word count and compare it to the number of keyword ? Is this approch accurate? Does anyone has dealt with something like this before?

Thanks for your support.

4

1 回答 1

0

是的,这绝对是有道理的。但是,您必须在实践中评估它是否对您的目的足够精确。这在很大程度上取决于您正在处理的文本。

如果您想尝试一些更高级但不太复杂的东西:余弦相似度是比较文本的另一种常用度量。

有大量用于文本分类的算法和库。LingPipe是一个不错的 Java 库,可能会对您有所帮助。

如果您对使用库感兴趣,您可以在此quora question的最佳答案中找到一个很好的概述。

于 2015-12-23T09:54:30.207 回答