1
with open('out.txt', 'r+') as f:
    data = mmap.mmap(f.fileno(),0)
    ips = re.findall(b"(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})", data)[0].split(b",")
    print(ips)

这部分代码会打开一个包含大量行的文件,这个正则表达式会在该文件中找到 ips。(我使用 mmap 来避免内存错误)

这是列表“ips”,如您所见,只有一个元素的 ips 以逗号分隔:

[b'41.39.180.122', b'192.28.64.246', b'213.82.176.107', b'3.120.158.39', b'5.189.139.56', b'178.128.36.166', b'203.117.94.11', b'5.79.119.182', b'52.48.41.230', b'81.169.129.6', b'178.114.8.24', b'67.20.116.110', b'205.201.139.164', b'180.215.241.68', etc etc ]

我尝试使用 split(b",") 但我有这个输出,第一个 IP 被正确打印,但是......

['41.39.180.122']
[
b
'

8
0

'
,

etc etc

编辑(固定):

with open('out.txt', 'r+') as f:
    data = mmap.mmap(f.fileno(),0)
    ips = re.findall(b"(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})", data)
    ipsn = []
    for ip in ips:
        ip = ip.split(b",")
        ipsn.append(ip)
    print(ipsn)
4

1 回答 1

1

如果要拆分字节对象,则需要使用字节分隔符而不是字符串分隔符。因此,.split(",")您应该使用.split(b",").

https://docs.python.org/3/library/stdtypes.html#bytes.split

于 2020-04-09T11:54:08.203 回答