0

我创建了一个表,其中包含一个关键字列表和一个标识同义词的代码。即所有具有相同代码的关键字将被视为相同的关键字。 varchar varchar tsvector C1000 AI 'ai':1 C1000 Artificial intelligence 'artifici':1 'intellig':2 C1001 Algorithms 'algorithm':1 C1002 Software Design 'design':2 'softwar':1 C1003 ui design 'design':2 'ui':1 C1003 User interface design 'design':3 'interfac':2 'user':1 C1003 user interface engineering 'engin':3 'interfac':2 'user':1

我想构建一个查询,返回在给定文本中找到的关键字列表。

例如,以下文本(只是一个示例)应返回数组:[C1001,C1003]。

A good ui design starts from a good algorithm design, for this you need a good user interface engineering.

有没有办法使用 postgres 查询或自定义函数来做到这一点?

4

2 回答 2

0

您可以使用朴素贝叶斯分类器算法。它是最强大的文本分类算法。从这里了解更多信息

于 2018-05-06T08:17:33.100 回答
0

您可以将文本转换为向量、要查询的关键字,然后检查向量是否与查询匹配

=> \d codes 
Column  |       Type        | Modifiers 
---------+-------------------+-----------
code    | character varying | 
keyword | character varying | 

=> select * from codes ;
 code  |          keyword           
-------+----------------------------
C1000 | AI
C1000 | Artificial intelligence
C1001 | Algorithms
C1002 | Software Design
C1003 | ui design
C1003 | User interface design
C1003 | user interface engineering
(7 rows)

=> select distinct code from codes where to_tsvector('A good ui design starts from a good algorithm design, for this you need a good user interface engineering.') @@ plainto_tsquery(keyword);
code  
-------
C1001
C1003
(2 rows)

=> select array_agg(distinct code) from codes where to_tsvector('A good ui design starts from a good algorithm design, for this you need a good user interface engineering.') @@ plainto_tsquery(keyword);
array_agg   
---------------
{C1001,C1003}
(1 row)
于 2018-05-06T13:58:41.160 回答