0

我想提高我的代码的可读性和格式。我有这段代码,它有效,但我觉得它可能比这更严格,我似乎无法让它以任何其他方式工作。这个想法是读取一个 .txt 文件,查找传入的电子邮件字符串,并按发送小时的频率组织数据。

这是我在文本中寻找的示例行:

来自 email@emailaddress.com 2008 年 1 月 5 日星期六 09:14:16

这是我今天的代码。

fname = input("Enter file:")
if len(fname) <1 : fname = "mbox-short.txt"
fh = open(fname)
time = list()
hours = list()
hr = dict()

for line in fh:
        if not line.startswith("From "): continue
        words = line.split()
        time.append(words[5])

for i in time:
        i.split(":")
        hours.append(i[:2])

for h in hours:
        hr[h]=hr.get(h, 0)+1

l = list()
for k,v in hr.items():
        l.append((k,v))
l.sort()
for k, v in l:
        print (k,v)
4

3 回答 3

1

这是(我认为是)功能等效的代码:

from collections import Counter

fname = input("Enter file: ")
if fname == "":
    fname = "mbox-short.txt"

hour_counts = Counter()
with open(fname) as f:
    for line in f:
        if not line.startswith("From "):
            continue
        words = line.split()
        time = words[5]
        hour = time[:2]
        hour_counts[hour] += 1

for hour, count in sorted(hour_counts.items()):
    print(hour, count)

您可能还想用现有的 Python 库解析 mbox 格式,而不是自己做。

于 2015-11-19T22:38:48.437 回答
0

只是一些提示:(不要在家里尝试这个,这是非常糟糕的代码:D,但要展示一些 Python 结构来学习)(运算符、defaultdict 和列表理解)

from collections import defaultdict
import operator

hr = defaultdict(int)

with open(fname) as fh:
    hours = [data.split()[5].split(":")[:2] for data in fh if data.startswith("From ")]

for h in hours:
    hr[h]+=1

sorted_hr = sorted(hr.items(),key=operator.itemgetter(1))
for k, v in sorted_hr:
        print (k,v)
于 2015-11-19T21:48:02.280 回答
0

正则表达式方法将是这样的

import re
hours=[]
with open("new_file") as textfile:
    for line in textfile:
        if re.search("^From [A-Za-z0-9]+[@][a-zA-Z]+[.][a-z]{3}",line):
            hours.append(re.sub(".*([0-9]{2})[:][0-9]{2}[:][0-9]{2} [0-9]{4}.*","\\1",line.strip()))

hours.sort()               
print(hours)

例如 ,如果下面的数据在文件中new_file

kwejrhkhwr
From johnking@emailaddress.com Sat Jan 5 09:14:16 2008
From JohnPublic@emailaddress.com Sat Dec 31 01:40:16 2015
Something not needed here
Something not needed here
From JohnPublic125@emailaddress.com Sat Oct 25 44:03:10 2015

按升序输出小时数

['01', '09', '44']
于 2015-11-19T22:52:02.317 回答