python - 使用设置差异来获取缺失值的行号

Question

我有两个列表，我使用以下函数来分配行号（类似于 unix 中的 nl）：

def nl(inFile):
    numberedLines = []
    for line in fileinput.input(inFile):
        numberedLines.append(str(fileinput.lineno()) + ':  ' + line)
    numberWidth = int(log10(fileinput.lineno())) + 1
    for i, line in enumerate(numberedLines):
        num, rest = line.split(':',1)
        fnum = str(num).rjust(numberWidth)
        numberedLines[i] = ':'.join([fnum, rest])
    return ''.join(numberedLines)

这将返回列表，如：1: 12 14 2: 20 49 3: 21 28。对于infile我正在使用的，行号非常重要。我的第二个列表的结构相同，但行号没有任何意义。我需要从第二个文件中找到列表差异并从第一个文件中返回行号。例如：如果第二个文件有：5: 12 14 48: 20 49我只想返回3第一个列表中缺失值的行号。

这是我尝试过的：

oldtxt = 'master_list.txt'  # Line numbers are significant
newFile = 'list2compare.txt' # Line numbers don't matter

s = set(nl(oldtxt))
diff = [x for x in (newFile) if x not in s]
print diff

返回：[12 14\n', '20 49\n', '21 28\n']-- 显然不是我需要的。有任何想法吗？

score 0 · Accepted Answer

以下情况如何：

f1 = """\
12 14
20 49
21 28
"""

f2 = """\
12 14
20 49
"""

def parse(lines):
  "Take a list of lines, turn into a dict of line number => value pairs"
  return dict((i + 1, v) for i, v in enumerate(l for l in lines if l))

def diff(a, b):
  """
  Given two dicts from parse(), remove go through each linenno => value in a and
  if the value is in b's values, discard it; finally, return the remaining
  lineno => value pairs
  """
  bvals = frozenset(b.values())
  return dict((ak, av) for ak, av in a.items() if av not in bvals)

def fmt(d):
  "Turn linno => value pairs into '  lineno: value' strings"
  nw = len(str(max(d.keys())))
  return ["{0:>{1}}: {2}".format(k, nw, v) for k, v in d.items()]

d1 = parse(f1.splitlines())
print d1
print
d2 = parse(f2.splitlines())
print d2
print
d = diff(d1, d2)
print d
print
print "\n".join(fmt(d))

这给了我输出：

{1: '12 14', 2: '20 49', 3: '21 28'}

{1: '12 14', 2: '20 49'}

{3: '21 28'}

3: 21 28

score 0 · Accepted Answer

您可以将difflib用于 ttis：

>>> f1 = """1 2 3 4
... test
... 6 7 8 9
... compare
... me
... """
>>> 
>>> f2 = """6 7 8 9
... 10 11 12 13
... me
... """
>>>
>>> import difflib
>>> for line in difflib.ndiff(f1.splitlines(), f2.splitlines()):
...    if line.startswith('-'):
...       print "Second file is missing line: '%s'" % line
...    if line.startswith('+'):
...       print "Second file contains additional line: '%s'" % line
... 
Second file is missing line: '- 1 2 3 4'
Second file is missing line: '- test'
Second file is missing line: '- compare'
Second file contains additional line: '+ 10 11 12 13'

score 0 · Accepted Answer

我会对此进行尝试；）听起来您在主文件的行号之后，该行的内容也在比较文件中。这就是你所追求的吗？在这种情况下，我建议...

主文件内容...

1 2 3 4
test
6 7 8 9
compare
me

比较文件内容...

6 7 8 9
10 11 12 13
me

代码：

master_file = open('file path').read()
compare_file = open('file path').read()

lines_master = master_file.splitlines()
lines_compare = compare_file.splitlines()
same_lines = []
for i,line in enumerate(lines_master):
    if line in lines_compare:
        same_lines.append(i+1)

print same_lines

结果是 [3,5]

python - 使用设置差异来获取缺失值的行号

3 回答 3

Related

Reference