php - 找到一个词的根词

问问题 2011-03-27T17:01:49.793

4307 次

5

我需要建立一个 php 字典，它将找到一个词的根词。前任。搜索“cars”，它会告诉“Cars 是 car 的复数形式”或“took”，它是“take 的过去时”

我正在考虑使用 Wordnet，但它似乎很复杂。

有什么建议吗？我绝望了

问候;

3 回答 3

5

好吧，由于建议的词干分析器不适合你，你可以从这里选择一些更适合你的：

http://snowball.tartarus.org/

这里还有一些有趣的库：http: //sourceforge.net/projects/nlp/

还链接到 StackOverflow 上的类似问题：

使用 PHP 的 NLP 编程工具？

使用 PHP 进行文本挖掘

更新：如何进行词干化或词形还原？

http://www.reddit.com/r/programming/comments/8e5d3/how_do_i_programatically_do_stemming_eg_eating_to/

http://www.nltk.org/

Wordnet lemmatizer：http ://wordnet.princeton.edu/wordnet/download/

于 2011-03-28T06:19:40.620 回答

1

好吧，这是一个执行词干提取的扩展（我相信这就是你想要的）： http: //pecl.php.net/package/stem

但是，它不对作品进行任何语法分析。

这是 php-only 版本： http: //www.chuggnutt.com/stemmer.php

于 2011-03-27T19:58:43.350 回答

0

您可以在这里试用免费的 Lemmatizer API：http: //twinword.com/lemmatizer.php

向下滚动以找到 Lemmatizer 端点。

这将允许您将“狗”变为“狗”，将“能力”变为“能力”。

如果你传入一个名为 "text" 的 POST 或 GET 参数，并带有类似 "walked plants" 的字符串：

// These code snippets use an open-source library. http://unirest.io/php
$response = Unirest\Request::post("[ENDPOINT URL]",
  array(
    "X-Mashape-Key" => "[API KEY]",
    "Content-Type" => "application/x-www-form-urlencoded",
    "Accept" => "application/json"
  ),
  array(
    "text" => "walked plants"
  )
);

你会得到这样的回应：

{
  "lemma": {
    "plant": 1,
    "walk": 1
  },
  "result_code": "200",
  "result_msg": "Success"
}

于 2015-04-17T13:23:45.967 回答