python - Python在文件中查找最后一次出现

Question

我有一个不同 IP 的文件。

192.168.11.2
192.1268.11.3
192.168.11.3
192.168.11.3
192.168.11.2
192.168.11.5

到目前为止，这是我的代码。我在哪里打印 IP 和发生，但是我如何才能知道每个 IP 的最后一次发生是什么时候。这是一个简单的方法吗？

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for element in liste:
        if element in dit:
                dit[element] +=1
        else:
                dit[element] = 1

for key,value in dit.items():
        print "%s occurs %s times, last occurence at line"  %(key,value)

输出：

192.1268.11.3 occurs 1 times, last occurence at line
192.168.11.3 occurs 2 times, last occurence at line
192.168.11.2 occurs 2 times, last occurence at line
192.168.11.5 occurs 1 times, last occurence at line

score 3 · Accepted Answer

尝试这个：

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for i, element in enumerate(liste, 1):
        if element in dit:
                dit[element][0] += 1
                dit[element][1] =  i
        else:
                dit[element] = [1,i]

for key,value in dit.items():
        print "%s occurs %d times, last occurence at line %d" % (key, value[0], value[1])

score 2 · Accepted Answer

这是一个解决方案：

from collections import Counter

with open('ip.txt') as input_file:
    lines = input_file.read().splitlines()

    # Find last occurrence, count
    last_line = dict((ip, line_number) for line_number, ip in enumerate(lines, 1))
    ip_count = Counter(lines)

    # Print the stat, sorted by last occurrence
    for ip in sorted(last_line, key=lambda k: last_line[k]):
        print '{} occurs {} times, last occurence at line {}'.format(
            ip, ip_count[ip], last_line[ip])

讨论

我使用该enumerate函数生成行号（从第 1 行开始）
使用 (ip, line_number) 的序列，很容易生成字典last_line，其中键是 IP 地址，值是它出现的最后一行
为了计算出现次数，我使用了Counter类——非常简单
如果您希望报告按 IP 地址排序，请使用sorted(last_line)
此解决方案具有性能影响：它扫描 IP 列表两次：一次计算last_line，一次计算ip_count。这意味着如果文件很大，此解决方案可能并不理想

score 1 · Accepted Answer

last_line_occurrence = {}
for element, line_number in zip(liste, range(1, len(liste)+1)):
     if element in dit:
            dit[element] +=1
     else:
            dit[element] = 1
     last_line_occurrence[element] = line_number

for key,value in dit.items():
     print "%s occurs %s times, last occurence at line %s"  %(key,value, last_line_occurrence[key])

score 1 · Accepted Answer

这可以很容易地一次性完成，而无需将所有文件读入内存：

from collections import defaultdict
d = defaultdict(lambda: {"ind":0,"count":0})

with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        d[ip]["ind"] = ind
        d[ip]["count"]  += 1

for ip ,v in d.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

输出：

IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

如果您想要第一次遇到 ip 的顺序，请使用 OrderedDict：

from collections import OrderedDict
od = OrderedDict()
with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        od.setdefault(ip, {"ind": 0,"count":0})
        od[ip]["ind"] = ind
        od[ip]["count"] += 1

for ip ,v in od.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

输出：

IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

score 0 · Accepted Answer

您可以使用其他字典。在此字典中，您为每一行存储最后一次出现的行号，并在每次找到另一个出现时覆盖。最后，在这本字典中，对于每一行，您将拥有最后一次出现的行号。

显然，您需要为每个读取行增加一个计数器，以便知道您现在正在读取的行。

python - Python在文件中查找最后一次出现

5 回答 5

讨论

Related

Reference