我正在寻找使用预定义函数的垃圾邮件过滤器的精度和召回率
使用预定义函数时,我无法让它们返回值 1.0 以外的任何值。
我知道这是不正确的,因为我应该得到 0.529411764706 的精度结果。
另外,我使用 pop 因为由于某种原因每个列表的第一个条目不是数字,所以我不能使用 append(int(...
以下是功能:
def precision(ref, hyp):
"""Calculates precision.
Args:
- ref: a list of 0's and 1's extracted from a reference file
- hyp: a list of 0's and 1's extracted from a hypothesis file
Returns:
- A floating point number indicating the precision of the hypothesis
"""
(n, np, ntp) = (len(ref), 0.0, 0.0)
for i in range(n):
if bool(hyp[i]):
np += 1
if bool(ref[i]):
ntp += 1
return ntp/np
def recall(ref, hyp):
"""Calculates recall.
Args:
- ref: a list of 0's and 1's extracted from a reference file
- hyp: a list of 0's and 1's extracted from a hypothesis file
Returns:
- A floating point number indicating the recall rate of the hypothesis
"""
(n, nt, ntp) = (len(ref), 0.0, 0.0)
for i in range(n):
if bool(ref[i]):
nt += 1
if bool(hyp[i]):
ntp += 1
return ntp/nt
这是我的代码:
import hw10_lib
from hw10_lib import precision
from hw10_lib import recall
actual = []
for line in open("/path/hw10.ref", 'r'):
actual.append(line.strip().split('\t')[-1])
actual.pop(0)
predicted = []
for line in open("/path/hw10.hyp", 'r'):
predicted.append(line.strip().split('\t')[-1])
predicted.pop(0)
prec = precision(actual, predicted)
rec = recall(actual, predicted)
print ('Precision: ', prec)
print ('Recall: ', rec)