0

假设我有以下字符串:

"present present present presenting presentation do  do doing " 

我正在根据它们的频率按降序计算字符串中的单词:

I'm using GroupBy count 
present    3
do         2
doing      1
presenting 1
presentation 1

然后,我正在阻止这些词:

using array [ , ] or any other structure

present  3
do       2
do       1
present  1
present  1

最后,我想将单词重新数入字典。所以输出应该是:

present 5
do      3

任何人都可以帮忙吗??提前致谢。

4

2 回答 2

1

//使用 List 而不是 Dictionary 来允许键的多重性: List> words = new List< KeyValuePair>();

        string text = "present present present presenting presentation do  do doing";
        var ws = text.Split(' ');

        //Passing the words into the list:
        words = (from w in ws
                 group w by w into wsGroups
                 select new KeyValuePair<string, int>(
                     wsGroups.Key, ws.Count()
                     )
                 ).ToList<KeyValuePair<string, int>>();

        //Ordering:
        words.OrderBy(w => w.Value);

        //Stemming the words:
        words = (from w in words
                 select new KeyValuePair<string, int>
                     (
                         stemword(w.Key),
                         w.Value
                     )).ToList<KeyValuePair<string, int>>();

        //Sorting and put into Dictionary:
        var wordsRef = (from w in words
                        group w by w.Key into groups
                        select new
                        {
                            count = groups.Count(),
                            word = groups.Key
                        }).ToDictionary(w => w.word, w => w.count);
于 2012-07-21T00:01:11.070 回答
0

LINQ GroupBy 或 Aggregate 是计算此类计数的好方法。

如果你想手工做......看起来你想要两组结果:一组非词干词,另一组词干:

void incrementCount(Dictionary<string, int> counts, string word)
{
  if (counts.Contains(word))
  {
    counts[word]++;
  }
  else
  {
    counts.Add(word, 0);
  }
}

var stemmedCount = new Dictionary<string, int>();
var nonStemmedCount = new Dictionary<string, int>();

foreach(word in words)
{
  incrementCount(stemmedCount, Stem(word));
  incrementCount(nonStemmedCount, word);
}
于 2012-07-20T23:38:28.143 回答