0

基本上我想遍历所有句子,例如:

string sentence = "How was your day - Andrew, Jane?";
string[] separated = SeparateSentence(sentence);

separated输出如下:

[1] =“如何”

[2] = " "

[3] = “曾经”

[4] = " "

[5] = “你的”

[6] = " "

[7] = “天”

[8] = " "

[9] =“-”

[10] = " "

[11] = “安德鲁”

[12] = ","

[13] = " "

[14] = “简”

[15] = “?”

截至目前,我只能使用正则"\w(?<!\d)[\w'-]*"表达式来抓取单词。根据输出示例,如何将句子分成更小的部分?

编辑:该字符串没有以下任何内容:

  • IE

  • 固体形式

  • 第八、第一、第二

4

3 回答 3

2

看一下这个:

        string pattern = @"^(\s+|\d+|\w+|[^\d\s\w])+$";
        string input = "How was your 7 day - Andrew, Jane?";

        List<string> words = new List<string>();

        Regex regex = new Regex(pattern);

        if (regex.IsMatch(input))
        {
            Match match = regex.Match(input);

            foreach (Capture capture in match.Groups[1].Captures)
                words.Add(capture.Value);
        }
于 2013-05-14T14:22:05.630 回答
1

我建议您实现一个简单的词法分析器(如果存在这样的东西),它将一次读取一个字符并生成您正在寻找的输出。尽管不是最简单的解决方案,但它具有可扩展性的优势,以防您的用例变得像@AndreCalil 建议的那样变得更加复杂。

于 2013-05-14T14:16:13.157 回答
1

为什么不这样呢?它是为您的测试用例量身定制的,但如果您添加标点符号,这可能就是您要寻找的。

(\w+|[,-?])

编辑:啊,从安德烈的回应中窃取,这就是我的设想:

string pattern = @"(\w+|[,-?])";
string input = "How was your 7 day - Andrew, Jane?";

List<string> words = new List<string>();

Regex regex = new Regex(pattern);

if (regex.IsMatch(input))
{
    MatchCollection matches = regex.Matches(input);

    foreach (Match m in matches)
        words.Add(m.Groups[1].Value);
}
于 2013-05-14T14:22:38.367 回答