-4

我设法让我的上述代码工作,但我收到以下错误。尝试谷歌搜索,我有点理解它的数据类型问题。但是,如果我更改上述两个函数的数据类型,我会得到同样的错误。我应该怎么办?

*在这种情况下尝试计算词汇密度指数。

//For counting unique words
 private void UniqueWordCount(string fbStatus)
        {
            int count = 0;
            var countedWordList = new List<string>(100);
            var reg = new Regex(@"\w+");
            foreach (Match match in reg.Matches(fbStatus))
            {
                string word = match.Value.ToLower();
                if (!countedWordList.Contains(word))
                {
                    ++count;
                    countedWordList.Add(word);
                }
            }
            label_totaluniquewords.Text = count.ToString();
        }

//For counting total words

  private void SplitWords(string fbStatus)
        {
            int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
            label_totalwordcount.Text = splitWords.ToString();
        }

//For counting lexical density (trying to make this work...)
   private void CalculateLexicalDensity(string fbStatus)
        {
            int ld = 0;
            ld = (UniqueWordCount(fbStatus) / SplitWords(fbStatus)) * 100;
            label_lexicaldensity.Text = ld.ToString();
        }
4

3 回答 3

4

SplitWords不返回它计算的值。如果您打算返回计数,请添加

return splitWords;

在函数的末尾,并声明它int

private int SplitWords(string fbStatus)
    {
        int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
        label_totalwordcount.Text = splitWords.ToString();
        return splitWords;
    }

但是请注意,由于整数除法,您的百分比计算可能会关闭。在应用除法之前,您应该返回 adecimal或强制转换为。decimal

您还可以更改操作顺序

ld = 100 * UniqueWordCount(fbStatus) / SplitWords(fbStatus);

获得截断为最高整数百分比的整数结果。

于 2013-07-23T03:27:48.653 回答
2

将代码更改为:

//For counting unique words
 private int UniqueWordCount(string fbStatus)
        {
            int count = 0;
            var countedWordList = new List<string>(100);
            var reg = new Regex(@"\w+");
            foreach (Match match in reg.Matches(fbStatus))
            {
                string word = match.Value.ToLower();
                if (!countedWordList.Contains(word))
                {
                    ++count;
                    countedWordList.Add(word);
                }
            }
            label_totaluniquewords.Text = count.ToString();
            return count;
        }



private int SplitWords(string fbStatus)
        {
            int splitWords = fbStatus.Split(new char[] { ' ' },StringSplitOptions.RemoveEmptyEntries).Count();
            label_totalwordcount.Text = splitWords.ToString();
            return splitWords;
        }

//For counting lexical density (trying to make this work...)
   private void CalculateLexicalDensity(string fbStatus)
        {
            decimal ld = 0;
            ld = ((decimal)UniqueWordCount(fbStatus) / (decimal)SplitWords(fbStatus)) * 100;
            label_lexicaldensity.Text = ld.ToString();
        }
于 2013-07-23T03:27:36.013 回答
1

1.添加分割文本的方法

由于 UniqueCount 和 SplitWords 都将处理从原始文本中提取的单词列表,因此为此创建一个函数是有意义的。

此方法接受一个包含您要使用的文本的字符串,并返回一个包含它所具有的单词的字符串数组。

private string[] GetWords(string text)
{
    return text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
}

2. 编写函数以处理数组

计算唯一词:

private int UniqueCount(string[] words)
{
    var foundWords = new List<string>();
    foreach (var word in words)
    {
        string word = word.ToLower();
        if (!foundWords.Contains(word))
        {
            foundWords.Add(word);
        }
    }
    return foundWords.Length;
}

计算总字数:

private int Count(string[] words)
{
    return words.Length;
}

对于词汇密度:

private double CalculateLexicalDensity(string[] words)
{
    return ((double)UniqueCount(words) / (double)Count(words));
}

注意:这些都没有更新标签,我想把这个问题分成另一个方法。


3. 创建更新标签的方法

此方法调用其他方法并更新标签

注意:我坚信 fbStatus 应该是一个参数。

private void UpdateLabels(string fbStatus)
{
    var words = GetWords(fbStatus);     
    label_totalwordcount = Count(words).ToString();
    label_totaluniquewords.Text = UniqueCount(words).ToString();
    label_lexicaldensity = (CalculateLexicalDensity(words) * 100).ToString() + "%";
}

4.摆脱冗余计算

为此,我们有几个选择:

4.A. 再次混合关注点:

在这种情况下,我会将CalculateLexicalDensity 方法融合到UpdateLabels 中,这样我就可以避免同时执行UniqueCount 和Count 两次。

private void UpdateLabels(string fbStatus)
{
    var words = GetWords(fbStatus);     
    int wordCount = Count(words);
    int uniqueWordCount = UniqueWordCount(words);
    double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
    label_totalwordcount = wordCount.ToString();
    label_totaluniquewords.Text = uniqueWordCount.ToString();
    label_lexicaldensity = (lexicalDensity * 100).ToString() + "%";
}

4.B。使用元组作为返回类型:

在这种情况下,我会将 Count、UniqueCount 和 CalculateLexicalDensity 融合到一个方法中,这将允许 - 再次 - 避免两次执行 UniqueCount 和 Count。由于此方法需要返回三个值,因此它将返回一个元组 [它也可以是自定义类型]。

private UpdateLabels(string fbStatus)
{
    var words = GetWords(fbStatus);     
    var info = Process(words);
    label_totalwordcount = info.Item1.ToString();
    label_totaluniquewords.Text = info.Item2.ToString();
    label_lexicaldensity = (info.Item3 * 100).ToString() + "%";
}

private Tuple<int, int, double> Process(string[] words)
{
    int wordCount = Count(words);
    int uniqueWordCount = UniqueWordCount(words);
    double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
    return new Tuple<int, int, double>(wordCount, uniqueWordCount, lexicalDensity);
}

由于此选项将关注点分开,因此我更喜欢这个选项。然而,在您不能(或您不想)使用元组的情况下,您可以使用自定义类型......对于这种情况,我更喜欢结构......

4.C。使用结构作为返回类型:

struct LexicalInfo
{
    public int WordCount;
    public int UniqueWordCount;
    public int LexicalDensity;
}

使用此结构,代码将是:

private UpdateLabels(string fbStatus)
{
    var words = GetWords(fbStatus);     
    var info = Process(words);
    label_totalwordcount = info.WordCount.ToString();
    label_totaluniquewords.Text = info.UniqueWordCount.ToString();
    label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}

private LexicalInfo Process(string[] words)
{
    int wordCount = Count(words);
    int uniqueWordCount = UniqueWordCount(words);
    double lexicalDensity = ((double)uniqueWordCount / (double)wordCount);
    return new LexicalInfo()
            {
                WordCount = wordCount,
                UniqueWordCount = uniqueWordCount,
                LexicalDensity = lexicalDensity
            };
}

此外,如果我们要使用结构......

4.D。使用结构进行计算:

注意:在这种情况下,它也可能是一个类。

struct LexicalInfo
{
    private int wordCount;
    private int uniqueWordCount;

    public LexicalInfo(string text)
    {
        var words = GetWords(text);
        wordCount = Count(words);
        uniqueWordCount = UniqueCount(words);
    }

private string[] GetWords(string text)
{
    return text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
}

    private int UniqueCount(string[] words)
{
    var foundWords = new List<string>();
    foreach (var word in words)
    {
        string word = word.ToLower();
        if (!foundWords.Contains(word))
        {
            foundWords.Add(word);
        }
    }
    return foundWords.Length;
}

private int Count(string[] words)
{
    return words.Length;
}

    public int WordCount
    {
        get
        {
            return wordCount;
        }
    }

    public int UniqueWordCount
    {
        get
        {
            return uniqueWordCount;
        }
    }

    public double LexicalDensity
    {
        get
        {
            return ((double)uniqueWordCount / (double)wordCount);
        }
    }
}

//----

private UpdateLabels(string fbStatus)
{
    var info = new LexicalInfo(words);
    label_totalwordcount = info.WordCount.ToString();
    label_totaluniquewords.Text = info.UniqueWordCount.ToString();
    label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}

5.优化

我将采用最终代码(使用 struct 进行计算的代码)并对其进行处理。

我们有两个只有一行的方法(方法是 GetWords 和 Count),我将摆脱它们并用方法体替换调用:

struct LexicalInfo
{
    private int wordCount;
    private int uniqueWordCount;

    public LexicalInfo(string text)
    {
        var words = text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
        wordCount = words.Length;
        uniqueWordCount = UniqueCount(words);
    }

    private int UniqueCount(string[] words)
{
    var foundWords = new List<string>();
    foreach (var word in words)
    {
        string word = word.ToLower();
        if (!foundWords.Contains(word))
        {
            foundWords.Add(word);
        }
    }
    return foundWords.Length;
}

    public int WordCount
    {
        get
        {
            return wordCount;
        }
    }

    public int UniqueWordCount
    {
        get
        {
            return uniqueWordCount;
        }
    }

    public double LexicalDensity
    {
        get
        {
            return ((double)uniqueWordCount / (double)wordCount);
        }
    }
}

//----

    private UpdateLabels(string fbStatus)
{
    var info = new LexicalInfo(words);
    label_totalwordcount = info.WordCount.ToString();
    label_totaluniquewords.Text = info.UniqueWordCount.ToString();
    label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}

6. 林克?

如果我们可以使用 Linq,我们可以将 UniqueCount 替换为一行:

struct LexicalInfo
{
    private int wordCount;
    private int uniqueWordCount;

    public LexicalInfo(string text)
    {
        var words = text.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
        wordCount = words.Length;
        uniqueWordCount = words.Distinct().Count();
    }

    public int WordCount
    {
        get
        {
            return wordCount;
        }
    }

    public int UniqueWordCount
    {
        get
        {
            return uniqueWordCount;
        }
    }

    public double LexicalDensity
    {
        get
        {
            return ((double)uniqueWordCount / (double)wordCount);
        }
    }
}

//----

private UpdateLabels(string fbStatus)
{
    var info = new LexicalInfo(fbStatus);
    label_totalwordcount = info.WordCount.ToString();
    label_totaluniquewords.Text = info.UniqueWordCount.ToString();
    label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";
}

7. 测试和修复

我已使用以下文本进行测试:

ESTE ES UN TEXTO QUE HE ESCRITO EN ESPAÑOL。ESTE TEXTO FUE ESCRITO PARA DEMOSTRACIÓN。ESTE TEXTO REPITE ALGUNAS DE SUS PALABRAS Y ALGUNAS OTRAS NO.

La salida fue:

WordCount = 28
UniqueWordCount = 21
LexicalDensity = 75%

然而,检查代码发现我们将标点符号作为单词的一部分进行计数(即,由于标点符号,代码将ESPAÑOLESPAÑOL.视为两个不同的单词)。

您可以使用正则表达式进行快速修复,以便将 LexicalInfo 的构造函数替换为:

    public LexicalInfo(string text)
    {
        var words = from match in (new Regex(@"\w+")).Matches(text).Cast<Match>() select match.Value;
        wordCount = words.Count();
        uniqueWordCount = words.Distinct().Count();
        Console.WriteLine(words.Distinct().ToArray());
    }

更改后的输出为:

WordCount = 28
UniqueWordCount = 20
LexicalDensity = 71.4285714285714%

您可能想要格式化 LexicalDensity,例如更改以下行:

     label_lexicaldensity = (info.LexicalDensity * 100).ToString() + "%";

对此:

    label_lexicaldensity = string.Format("{0:P2}", info.LexicalDensity);

会产生这个:

WordCount = 28
UniqueWordCount = 20
LexicalDensity = 71.43 %

注意:使用 string.Format 受其执行的文化影响。如果您不想更改文化,您可以指定一个,例如 InvariantCulture:

    label_lexicaldensity = string.Format("{0:P2}", info.LexicalDensity, CultureInfo.InvariantCulture);

使用另一个测试文本,我发现我已经失去了检测大写字母的能力。文字是

Este es otro texto escrito en español, el objetivo de este texto es probar las mayúsculas al repetir texto。

在这种情况下,代码将Esteandeste视为两个不同的词。这是 Linq 的另一个简单修复,更改此行:

        uniqueWordCount = words.Distinct().Count();

对此:

        uniqueWordCount = (from word in words select word.ToLower()).Distinct().Count();
于 2013-07-23T17:03:13.350 回答