python - python中的Flesch-Kincaid可读性测试

Question

我需要帮助解决我遇到的这个问题。我需要编写一个从文本返回 FRES（Flesch 易读测试）的函数。给定公式：

换句话说，我的任务是把这个公式变成一个 python 函数。

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

from itertools import chain
from nltk.corpus import gutenberg
def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

这是我得到的结果：

Failure
Expected :99.40...

Actual   :92.84866041488623

**********************************************************************
File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS
Expected:
    99.40...
Got:
    92.84866041488623

我的任务是通过 doctest 并得到 99.40 ...我也不允许更改以下代码，因为它是随任务本身提供给我的：

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

我觉得我已经接近了，但不知道为什么我会得到不同的结果。任何帮助都感激不尽。

score 0 · Accepted Answer

这三个num_*变量都是类型int（整数）。当您在大多数编程语言中除整数时，您会得到一个整数结果，向下取整，例如14 / 5产生 2，而不是 2.8。

将计算更改为

score = 206.835 - 1.015 * (float(num_words) / num_sents) - 84.6 * (num_syllables / float(num_words))

当除法中的一个操作数是 afloat时，另一个也被静默转换为 afloat并执行（精确）浮点除法。试试float(14)/2。

此外，您的正则表达式VC在元音中不包含“y”，并且不会将单词末尾的一组元音视为音节。这两个错误都低估了音节的数量，例如count_syllables("myrtle")将返回 0。

python - python中的Flesch-Kincaid可读性测试

1 回答 1

Related

Reference