python - Python：检查 csv 中的一个元素，使用另一个从第二个文件中删除

Question

我正在尝试使脚本正常工作，它将在查找 csv 文件中检查 IP 的存在，然后如果存在，则获取第三个元素并从另一个（第二个）文件中删除该第三个元素。这是我所拥有的摘录：

for line in fileinput.input(hostsURLFileLoc,inplace =1):
        elements = open(hostsLookFileLoc, 'r').read().split(".").split("\n")
        first = elements[0].strip()
        third = elements[2].strip()
        if first == hostIP:
                if line != third:
                        print line.strip()

这显然是行不通的，我尝试过使用一些选项，但这是我最新的（疯狂的）尝试。

我认为问题在于一次打开了两个输入文件。

欢迎任何想法，

干杯

score 5 · Accepted Answer

好吧，尽管我对这个问题的评论没有得到任何回应，但这是我对一般性答案的看法。如果我有什么问题，请直接说出来，我会进行编辑以尝试解决错误。

首先，这是我的假设。您有两个文件，他们的名字存储在HostsLookFileLoc和HostsURLFileLoc变量中。

该文件HostsLookFileLoc是一个 CSV 文件，每行的第三列有一个 IP 地址。像这样的东西：

HostsLookFile.csv：

blah,blah,192.168.1.1,whatever,stuff
spam,spam,82.94.164.162,eggs,spam
me,myself,127.0.0.1,and,I
...

该文件HostsURLFileLoc是一个平面文本文件，每行一个 IP 地址，如下所示：

HostsURLFile.txt：

10.1.1.2
10.1.1.3
10.1.2.253
127.0.0.1
8.8.8.8
192.168.1.22
82.94.164.162
64.34.119.12
...

您的目标是读取然后重写HostsURLFile.txt文件，不包括在 CSV 文件的行的第三列中找到的所有 IP 地址。在上面的示例列表中，localhost (127.0.0.1) 和 python.org (82.94.164.162) 将被排除，但列表中的其余 IP 将保留。

以下是我的做法，分三步：

读入 CSV 文件并使用csv模块对其进行解析以查找 IP 地址。将它们粘贴到set.
打开平面文件并将 IP 地址读入 a list，然后关闭文件。
重新打开平面文件并用加载的地址列表覆盖它，跳过第一步中集合中包含的任何内容。

代码：

import csv

def cleanURLFile(HostsLookFileLoc, HostsURLFileLoc):
    """
    Remove IP addresses from file at HostsURLFileLoc if they are in
    the third column of the file at HostsLookFileLoc.
    """
    with open(HostsLookFileLoc, "r") as hostsLookFile:
        reader = csv.reader(hostsLookFile)
        ipsToExclude = set(line[2].strip() for line in reader)

    with open(HostsURLFileLoc, "r") as hostsURLFile:
        ipList = [line.strip() for line in hostsURLFile]

    with open(HostsURLFileLoc, "w") as hostsURLFile: # truncates the file!
        hostsURLFile.write("\n".join(ip for ip in ipList
                                     if ip not in ipsToExclude))

这段代码故意简单。如果它们对您的用例很重要，则有一些可以改进的地方：

如果在重写步骤中出现程序崩溃，HostsURLFile.txt 可能会被破坏。一种更安全的重写方法（至少在 Unix 风格的系统上）是写入临时文件，然后在写入完成后（并且文件已关闭），在旧文件的顶部重命名临时文件。这样，如果程序崩溃，您仍然会拥有原始版本或完全编写的替代版本，但不会介于两者之间。
如果您需要做的检查比设置成员身份更复杂，我会在 2 到 3 之间添加一个额外的步骤来进行实际处理，然后在没有进一步操作的情况下写出结果（除了添加换行符）。
说到换行符，如果您有一个尾随换行符，它将作为 IP 地址列表中的空字符串传递，这对于这种情况应该没问题（它不会在要排除的 IP 集中，除非您的CSV 文件有一个混乱的行），但如果你做一些更复杂的事情可能会导致麻烦。

score 0 · Accepted Answer

在测试文件中test.csv（注意那里有一个 IP 地址）：

'aajkwehfawe;fh192.168.0.1awefawrgaer'

（我现在几乎忽略了它是 CSV。我将使用正则表达式匹配。）

# Get the file data
with open('test.csv', 'r') as f:
    data = f.read()

# Look for the IP:
find_ip = '192.168.0.1'
import re
m = re.search('[^0-9]({})[^0-9]'.format(find_ip), data)
if m: # found!
    # this is weird, because you already know the value in find_ip, but anyway...
    ip = m.group(1).split('.')
    print('Third ip = ' + ip[2])
else:
    print('Did not find a match for {}'.format(find_ip))

我不明白你问题的第二部分，即从第二个文件中删除第三个值。是否有逐行列出的数字，并且您想在上面找到包含此数字的行并删除该行？如果是：

# Make a new list of lines that omits the matched one
new_lines=[]
for line in open('iplist.txt','r'):
    if line.strip()!=ip[2]: # skip the matched line
        new_lines.append(line)

# Replace the file with the new list of lines
with open('iplist.txt', 'w') as f:
    f.write('\n'.join(new_lines))

score 0 · Accepted Answer

如果，一旦您在第一个文件中找到需要在第二个文件中删除的值，我建议使用类似以下伪代码的内容：

Load first file into memory
Search string representing first file for matches using a regular expression
    (in python, check for re.find(regex, string), where regex = re.compile("[0-9]{3}\\.[0-9]{3}\\.[0-9]\\.[0-9]"), I am not entirely certain that you need the double backslash here, try with and without)
Build up a list of all matches
Exit first file

Load second file into memory
Search string representing second file for the start index and end index of each match
For each match, use the expression string = string[:start_of_match] + string[end_of_match:]
Re-write the string representing the second (now trimmed) file to the second file

基本上每当您找到匹配项时，将字符串重新定义为它两侧的切片，将其从新的字符串分配中排除。然后将您的字符串重写为文件。

python - Python：检查 csv 中的一个元素，使用另一个从第二个文件中删除

3 回答 3

Related

Reference