0

编写程序提示输入文件名,然后通读文件并查找以下形式的行: X-DSPAM-Confidence: 0.8475 当遇到以“X-DSPAM-Confidence:”开头的行时,将line 以提取该行上的浮点数。计算这些行并计算这些行的垃圾邮件置信度值的总和。当您到达文件末尾时,打印出平均垃圾邮件置信度。

输入文件名:mbox.txt
平均垃圾邮件置信度:0.894128046745

输入文件名:mbox-short.txt
平均垃圾邮件可信度:0.750718518519 在 mbox.txt 和 mbox-short.txt 文件上测试您的文件。

到目前为止,我有:

 fname = raw_input("Enter file name: ")
 fh = open(fname)
 for line in fh:
     pos  = fh.find(':0.750718518519')
     x = float(fh[pos:])
     print x

这段代码有什么问题?

4

2 回答 2

4

听起来他们要求您平均所有“X-DSPAM-Confidence”数字,而不是 find 0.750718518519

就个人而言,我会找到您要查找的单词,提取数字,然后将所有这些数字放入一个列表并在最后平均它们。

像这样的东西-

# Get the filename from the user
filename = raw_input("Enter file name: ")

# An empty list to contain all our floats
spamflts = []

# Open the file to read ('r'), and loop through each line
for line in open(filename, 'r'):

    # If the line starts with the text we want (with all whitespace stripped)
    if line.strip().startswith('X-DSPAM-Confidence'):

        # Then extract the number from the second half of the line
        # "text:number".split(':') will give you ['text', 'number']
        # So you use [1] to get the second half
        # Then we use .strip() to remove whitespace, and convert to a float
        flt = float(line.split(':')[1].strip())

        print flt

        # We then add the number to our list
        spamflts.append(flt)

print spamflts
# At the end of the loop, we work out the average - the sum divided by the length
average = sum(spamflts)/len(spamflts)

print average

>>> lines = """X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4"""

>>> for line in lines.splitlines():
    print line


X-DSPAM-Confidence: 1
X-DSPAM-Confidence: 5
Nothing on this line
X-DSPAM-Confidence: 4

使用查找:

>>> for line in lines.splitlines():
    pos = line.find('X-DSPAM-Confidence:')
    print pos

0
0
-1
0

我们可以看到,它find()只是给了我们'X-DSPAM-Confidence:'在每一行中的位置,而不是它后面的数字的位置。

更容易找到一行是否以 开头'X-DSPAM-Confidence:',然后像这样提取数字:

>>> for line in lines.splitlines():
    print line.startswith('X-DSPAM-Confidence')


True
True
False
True

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print line.split(':')


['X-DSPAM-Confidence', ' 1']
['X-DSPAM-Confidence', ' 5']
['X-DSPAM-Confidence', ' 4']

>>> for line in lines.splitlines():
    if line.startswith('X-DSPAM-Confidence'):
        print float(line.split(':')[1])


1.0
5.0
4.0
于 2013-01-24T05:55:43.013 回答
-1

line.find#..... 所以你搜索线......

print pos #prints 帮助调试 ;)

float(fh[pos+1:])#你得到的索引实际上是 : 所以你需要再移动 1

于 2013-01-24T05:55:10.370 回答