python - 将可读性公式转换为python函数

Question

我得到了一个名为 FRES（Flesch 易读性测试）的公式，用于测量文档的可读性：

我的任务是编写一个返回文本 FRES 的 python 函数。因此，我需要将此公式转换为 python 函数。

我已经从我得到的答案中重新实现了我的代码，以展示我到目前为止所拥有的以及它给我的结果：

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
from itertools import chain
from nltk.corpus import gutenberg
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

运行代码后，这是我得到的结果消息：

Failure

Expected :99.40...

Actual   :92.84866041488623

File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS

Expected:
    99.40...
Got:
    92.84866041488623

我的函数应该通过 doctest 并得到 99.40 ......而且我也不允许编辑音节函数，因为它是任务附带的：

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

这个问题非常棘手，但至少现在它给了我一个结果而不是一条错误消息，但不知道为什么它给了我一个不同的结果。

任何帮助将不胜感激。谢谢你。

score 0 · Accepted Answer

顺便说一句，那里有textstat图书馆。

from textstat.textstat import textstat
from nltk.corpus import gutenberg

for filename in gutenberg.fileids():
    print(filename, textstat.flesch_reading_ease(filename))

如果你一心想自己编写代码，首先你必须

判断标点符号是否为单词
定义如何计数。单词中的音节。

如果标点符号是一个单词并且音节由您问题中的正则表达式计算，则：

import re
from itertools import chain
from nltk.corpus import gutenberg

def num_syllables_per_word(word):
    return len(re.findall('[aeiou]+[^aeiou]+', word))

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename) # i.e. list(chain(*sents))
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(num_syllables_per_word(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
    print(filename, score)

python - 将可读性公式转换为python函数

1 回答 1

Related

Reference