2

如何使用 LINQ 获取数据库文本字段中单词的出现次数?

关键字令牌示例:ASP.NET

编辑 4:

数据库记录:

记录 1:[TextField] = "Blah blah blah ASP.NET bli bli bli ASP.NET blu ASP.NET yop yop ASP.NET "

记录 2:[TextField] = "Blah blah blah bli bli bli blu ASP.NET yop yop ASP.NET "

记录 3:[TextField] = "Blah ASP.NET blah ASP.NET blah ASP.NET bli ASP.NET bli bli ASP.NET blu ASP.NET yop yop ASP.NET "

所以

记录 1 包含 4 次出现的“ASP.NET”关键字

记录 2 包含 2 次出现的“ASP.NET”关键字

记录 3 包含 7 次出现的“ASP.NET”关键字

集合提取IList<RecordModel>(按字数降序排列)

  • 记录 3
  • 记录 1
  • 记录 2

LinqToSQL 应该是最好的,但 LinqToObject 也是 :)

注意:“。”没有问题。ASP.NET 关键字(如果这个问题不是目标)

4

5 回答 5

4

编辑2:我看到你更新了问题,稍微改变了一下,每个单词的字数是吗?试试这个:

string input = "some random text: how many times does each word appear in some random text, or not so random in this case";
char[] separators = new char[]{ ' ', ',', ':', ';', '?', '!', '\n', '\r', '\t' };

var query = from s in input.Split( separators )
            where s.Length > 0
            group s by s into g
            let count = g.Count()
            orderby count descending
            select new {
                Word = g.Key,
                Count = count
            };

因为你想要可能有“。”的词。在它们(例如“ASP.NET”)中,我已将其从分隔符列表中排除,不幸的是,这会污染一些单词,如“Blah blah blah. Blah blah”这样的句子。将显示计数为 3 的“blah”和“blah”。计数为 2。您需要在此处考虑您想要什么清洁策略,例如“.”。两边都有一个字母,它算作单词的一部分,否则就是空格。这种逻辑最好用一些 RegEx 来完成。

于 2009-10-17T16:50:53.163 回答
3

正则表达式可以很好地处理这个问题。您可以使用\b元字符来锚定单词边界,并转义关键字以避免意外使用特殊的正则表达式字符。它还处理尾随句点、逗号等的情况。

string[] records =
{
    "foo ASP.NET bar", "foo bar",
    "foo ASP.NET? bar ASP.NET",
    "ASP.NET foo ASP.NET! bar ASP.NET",
    "ASP.NET, ASP.NET ASP.NET, ASP.NET"
};
string keyword = "ASP.NET";
string pattern = @"\b" + Regex.Escape(keyword) + @"\b";
var query = records.Select((t, i) => new
            {
                Index = i,
                Text = t,
                Count = Regex.Matches(t, pattern).Count
            })
            .OrderByDescending(item => item.Count);

foreach (var item in query)
{
    Console.WriteLine("Record {0}: {1} occurrences - {2}",
        item.Index, item.Count, item.Text);
}

瞧!:)

于 2009-10-17T17:51:26.100 回答
1

使用 String.Split() 将字符串转换为单词数组,然后使用 LINQ 过滤此列表,仅返回您想要的单词,然后检查结果的计数,如下所示:

myDbText.Split(' ').Where(token => token.Equals(word)).Count();
于 2009-10-17T16:44:47.560 回答
0

您可以Regex.Matches(input, pattern).Count或您可以执行以下操作:

int count = 0; int startIndex = input.IndexOf(word);
while (startIndex != -1) { ++count; startIndex = input.IndexOf(word, startIndex + 1); }

在这里使用 LINQ 会很丑

于 2009-10-17T16:29:53.120 回答
0

我知道这比最初提出的问题要多,但它仍然与主题相匹配,我将它包括在内,以供稍后搜索此问题的其他人使用。这不需要在搜索的字符串中匹配整个单词,但是可以使用 Ahmad 帖子中的代码轻松修改它。

//use this method to order objects and keep the existing type
class Program
{
  static void Main(string[] args)
  {
    List<TwoFields> tfList = new List<TwoFields>();
    tfList.Add(new TwoFields { one = "foo ASP.NET barfoo bar", two = "bar" });
    tfList.Add(new TwoFields { one = "foo bar foo", two = "bar" });
    tfList.Add(new TwoFields { one = "", two = "barbarbarbarbar" });

    string keyword = "bar";
    string pattern = Regex.Escape(keyword);
    tfList = tfList.OrderByDescending(t => Regex.Matches(string.Format("{0}{1}", t.one, t.two), pattern).Count).ToList();

    foreach (TwoFields tf in tfList)
    {
      Console.WriteLine(string.Format("{0} : {1}", tf.one, tf.two));
    }

    Console.Read();
  }
}


//a class with two string fields to be searched on
public class TwoFields
{
  public string one { get; set; }
  public string two { get; set; }
}

.

//same as above, but uses multiple keywords
class Program
{
  static void Main(string[] args)
  {
    List<TwoFields> tfList = new List<TwoFields>();
    tfList.Add(new TwoFields { one = "one one, two; three four five", two = "bar" });
    tfList.Add(new TwoFields { one = "one one two three", two = "bar" });
    tfList.Add(new TwoFields { one = "one two three four five five", two = "bar" });

    string keywords = " five one    ";
    string keywordsClean = Regex.Replace(keywords, @"\s+", " ").Trim(); //replace multiple spaces with one space

    string pattern = Regex.Escape(keywordsClean).Replace("\\ ","|"); //escape special chars and replace spaces with "or"
    tfList = tfList.OrderByDescending(t => Regex.Matches(string.Format("{0}{1}", t.one, t.two), pattern).Count).ToList();

    foreach (TwoFields tf in tfList)
    {
      Console.WriteLine(string.Format("{0} : {1}", tf.one, tf.two));
    }

    Console.Read();
  }
}

public class TwoFields
{
  public string one { get; set; }
  public string two { get; set; }
}
于 2010-08-04T21:26:41.837 回答