2

下面的代码应该从文件中查找第一列(键)Dict_file并将另一个文件的第一列替换为fr从 中找到的键的值dict_file。但它将 保留dict_file为更新的字典以供将来查找。

每次运行代码时,它都会从该 dict_file 文件中初始化一个字典。如果它从另一个文件中找到一个新的电子邮件地址,它会将其添加到 dict_file 的底部。

根据我的理解,它应该可以正常工作,因为如果它没有找到 @ 符号,它会为“Dummy@dummy.com”的值分配looking_for .. Dummy@dummy.com 应该附加到 dict_file 的底部。

但由于某种原因,我不断在 dict_file 末尾附加新行和空白行以及其他新电子邮件。我不能在 dict_file 的末尾写空格和换行符。

为什么会这样?下面的代码有什么问题,我的大脑要爆炸了!任何帮助将不胜感激!

#!/usr/bin/python

import sys

d = {}
line_list=[]
alist=[]

f = open(sys.argv[3], 'r') # Map file

for line in f:
    alist = line.split()
    key = alist[0]
    value = alist[1]
    d[str(key)] = str(value)
    alist=[]
f.close()

fr = open(sys.argv[1], 'r') # source file

fw = open(sys.argv[2]+"/masked_"+sys.argv[1], 'w') # target file

for line in fr:
    columns = line.split("|")
    looking_for = columns[0] # this is what we need to search
    if looking_for in d:
        # by default, iterating over a dictionary will return keys
        if not looking_for.find("@"):
            looking_for == "Dummy@dummy.com"
            new_line = d[looking_for]+'|'+'|'.join(columns[1:])
            line_list.append(new_line)
        else:
            new_line = d[looking_for]+'|'+'|'.join(columns[1:])
            line_list.append(new_line)
    else:
        new_idx = str(len(d)+1)
        d[looking_for] = new_idx
        kv = open(sys.argv[3], 'a')
        kv.write("\n"+looking_for+" "+new_idx)
        kv.close()
        new_line = d[looking_for]+'|'+'|'.join(columns[1:])
        line_list.append(new_line)
fw.writelines(line_list)

这是dict_file:

WHATEmail@SIMPLE.COM    223
SamHugan@CR.COM 224
SAMASHER@CATSTATIN.COM  225
FAKEEMAIL@SLOW.com  226
SUPERMANN@MYMY.COM 227

这是从 dict_file 查找中将第一列转换为 id 的 fr 文件:

WHATEmail@SIMPLE.COM|12|1|GDSP
FAKEEMAIL@SLOW.com|13|7|GDFP
MICKY@FAT.COM|12|1|GDOP
SUPERMANN@MYMY.COM|132|1|GUIP
MONITOR|132|1|GUIP
    |132|1|GUIP
00 |12|34|GUILIGAN
4

1 回答 1

4

首先,您需要忽略最初读取的字典中的空白,否则再次运行此脚本时会出现索引超出范围错误。通过 fr 对象读取时执行相同操作以避免输入空值。进一步包装您的电子邮件检查条件以获得更大的范围。使用 find 方法对“@”进行简单检查。你可以走了。

试试下面的。这应该有效:

#!/usr/bin/python

import sys

d = {}
line_list=[]
alist=[]
f = open(sys.argv[3], 'r') # Persisted Dictionary File

for line in f:
    line = line.strip()
    if line =="":
        continue
    alist = line.split()
    key = alist[0]
    value = alist[1]
    d[str(key)] = str(value)
    alist=[]
f.close()

fr = open(sys.argv[1], 'r') # source file
fw = open(sys.argv[2]+"/masked_"+sys.argv[1], 'w') # Target Directory Location

for line in fr:
    line = line.strip()
    if line == "":
        continue
    columns = line.strip().split('|')
    if columns[0].find("@") > 1:
        looking_for = columns[0] # this is what we need to search
    else:
        looking_for = "Dummy@dummy.com"
    if looking_for in d:
        # by default, iterating over a dictionary will return keys
            new_line = d[looking_for]+'|'+'|'.join(columns[1:])
            line_list.append(new_line)
    else:
        new_idx = str(len(d)+1)
        d[looking_for] = new_idx
        kv = open(sys.argv[3], 'a')
        kv.write(looking_for+" "+new_idx+'\n')
        kv.close()
        new_line = d[looking_for]+'|'+'|'.join(columns[1:])
        line_list.append(new_line)
fw.writelines(line_list)
于 2012-11-29T06:13:49.077 回答