1

我有一个字典文件夹,其中存储了一个字典列表,例如“愤怒”、“关心”等。例如,我在 facebook 上有一个帖子,上面写着“我闷闷不乐、烦躁不安、脾气暴躁”。在我的愤怒字典里,我有三个词闷闷不乐,烦躁,烦躁。当我运行我的字数统计程序时,它似乎无法准确检测所有单词。更具体地说,我的字数统计词典会检测到闷闷不乐和烦躁已经发生过一次,但没有发生过。

这个问题是由我的正则表达式引起的吗?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Windows.Forms;

namespace empTRUST
{
    class FBWordCount
    {
        public Dictionary<string, int> countWordsInStatus(string status, string[] dictArray)
        {
            var words = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase); // local word dictionary is created here
            foreach (var dictEntry in dictArray)
            {
                var wordPattern = new Regex(@"\w+");
                string smallDictEntry = dictEntry.ToLower();
                foreach (Match match in wordPattern.Matches(status))
                {
                    if (match.ToString() == smallDictEntry)
                    {
                        int currentCount = 0;
                        words.TryGetValue(match.Value, out currentCount);

                        currentCount++;
                        words[match.Value] = currentCount;  // local word dictionary adds new word count
                    }
                }
            }
            return words;   // returns local word dictionary to receiving end
        }
    }
}
4

1 回答 1

2

整个方法可以替换为单个 Linq 查询。尝试这个:

public Dictionary<string, int> countWordsInStatus(string status, string[] dictArray)
{
    var wordPattern = new Regex(@"\w+");
    return 
        (from Match m in wordPattern.Matches(status)
         where dictArray.Contains(m.Value)
         group m by m.Value)
        .ToDictionary(g => g.Key, g => g.Count(),
            StringComparer.CurrentCultureIgnoreCase);
}

你可以这样称呼它:

var results = countWordsInStatus(
    "I am sullen, irked, petulant.", 
    new[] { "sullen", "irked", "petulant" });
// { { "sullen", 1 }, 
//   { "irked", 1 }, 
//   { "petulant", 1 } }
于 2013-07-20T14:06:55.470 回答