I have 2 text files, my goal is to find the lines in file First.txt that are not in Second.txt and output said lines to a third text file Missing.txt, i have that done:
fn = "Missing.txt"
try:
fileOutPut = open(fn, 'w')
except IOError:
fileOutPut = open(fn, 'w')
fileOutPut.truncate()
filePrimary = open('First.txt', 'r', encoding='utf-8', errors='ignore')
fileSecondary = open('Second.txt', 'r', encoding='utf-8', errors='ignore')
bLines = set([thing.strip() for thing in fileSecondary.readlines()])
for line in filePrimary:
line = line.strip()
if line in bLines:
continue
else:
fileOutPut.write(line)
fileOutPut.write('\n')
fileOutPut.close()
filePrimary.close()
fileSecondary.close()
But after running the script i've come to a problem, there are lines that are very similar, examples:
[PR] Zero One Two Three ft Four
and (No space after the bracket)
[PR]Zero One Two Three ft Four
or
[PR] Zero One Two Three ft Four
and (capital F letter)
[PR] Zero One Two Three Ft Four
I have found SequenceMatcher, which does what i require, but how do i implement this into the comparison, since those are not just two strings, but a string and a set