regex - 匹配前 n 个单词出现的正则表达式

Question

我有一个格式的字符串：

word<class> word<class>...
For example:
I<Noun> like<verb> to<Function> eat<verb>...

是否可以使用正则表达式查找每个类出现的前 n 个单词，例如使用正则表达式的前 4 个名词单词。它将输出单词列表。

谢谢

score 3 · Accepted Answer

正则表达式不能用于计数。

所以不 - 您无法使用正则表达式找到前 n 个单词。

score 1 · Accepted Answer

为了完成你正在做的事情，你需要使用词性标注器来对句子中使用的单词进行分类。您可以使用任何一种自然语言处理库来做到这一点。例如。在python中你有pynltk。 http://answers.oreilly.com/topic/1091-how-to-use-an-nltk-part-of-speech-tagger/

之后，您需要根据词性对单词进行分组并计算它们。所以完全超出了正则表达式的范围。

score 0 · Accepted Answer

您的正则表达式模式是(\\s|^)([a-zA-Z]+?)<Noun>(\\s|$)，在每个找到的匹配项中，您应该使用它$2来获取结果

在 c# 中，您可以使用以下代码实现此目的：

     string type = "Noun";
     int top = 5;

     MatchCollection mc = Regex.Matches("I<Noun> like<verb> to<Function> eat<verb> an apple<Noun>", String.Format("(\\s|^)([a-zA-Z]+?)<{0}>(\\s|$)", type));

     List<string> res = new List<string>();

     for (int i = 0; i < mc.Count && i < top; i++)
     {
        res.Add(mc[i].Result("$2"));
     }

regex - 匹配前 n 个单词出现的正则表达式

3 回答 3

Related

Reference