python - 将一个文本文件分成两个具有不重叠条目的文件

Question

让我详细解释一下这个问题......

我有两个文本文件（池文件）CL-0.txt 和 CL-1.txt 我必须将这两个文件分别分成两个部分。CL-0.txt 分为另外两个部分 xx_0.txt 和 yy_0.txt 和 CL-1.txt 分为 xx_1.txt 和 yy_1.txt 部分。两个文件的内容格式如下。CL-0（苹果、橙子）（芒果、香蕉）（蛋糕、茶）（咖啡、糖）（牛奶、蜂蜜）（蛋糕、饼干） CL-1（橙子、芒果）（葡萄、咖啡）（汽车、冰淇淋) (桌子、椅子) (窗户、牛奶)

清除我所说的条目和实体：一个条目是：（apple，orange），一个实体是：apple 每个条目都有两个元素。逗号是分隔符。不应有重复的条目或条目。如果条目或元素已出现在 xx_0.txt 中，则不能出现在 yy_0.txt 和 yy_1.txt 中如果条目或元素已出现在 yy_0.txt 中，则不能出现在 xx_0.txt 和 xx_1.txt 中条目或元素已出现在 xx_1.txt 中，不能出现在 yy_0.txt 和 yy_1.txt 中如果条目或元素已出现在 yy_1.txt 中，则不能出现在 xx_0.txt 和 xx_1.txt 中

每个条目被一个一个地取出，并为两个文件交替选择条目，直到将一个条目写入文件。

预期输出如下

来自 CL-0 的组成文件：

*xx_0 文件应该有：（苹果，橙子）（蛋糕，茶）（牛奶，蜂蜜）

* yy_0 文件应该有：（芒果，香蕉）（咖啡，糖）（蛋糕，饼干）不能添加，因为蛋糕已经出现在 xx_0

来自 CL-1 的连续文件：

* xx_1 文件应该有：（橙色，芒果）*在这种情况下，重复条目是可以的（汽车，冰淇淋）

* yy_1 文件将具有：（葡萄，咖啡）*在这种情况下（桌子，椅子）（窗户，牛奶）不能在此处添加重复条目，因为它将具有已出现在 xx_0 文件中的重复实体牛奶

我尝试了一半的问题，认为如果我可以成功地将 CL-0 文件分成两部分，则只需稍作调整即可轻松实现其余部分。

我的努力如下：

xx_0=open('xx_0.txt','wb') #the file that i want to populate
yy_0=open('yy_0.txt','wb') #the file that i want to populate
file=open('CL-0.txt','r')  # the main file
xx0=set()
xx1=set() # un1 a set against which the desired file has to be checked against for matches
yy0=set()
yy1=set() # un2 a set against which the desired file has to be checked against for matches
for line in file:
    s=line.replace('[,]','')

    s=s.replace('\n','')
    s=s.replace('(','')
    s=s.replace(')','')
    s=s.replace("'",'')

    r=re.split(',',s)
    if L==1:
        for n in r:
            if n not in yy0:
                if n not in yy1:
                    xx0.add(n)
        r1= ', '.join(r)
        xx_0.write(r1)
        xx_0.write('\n')

        L+=1
        continue

    if L==2:
        for n in r:
            if n not in xx_1:
                if n not in yy_1:
                    yy0.add(n)                  
        r2=', '.join(r)
        yy_0.write(r2)
        yy_0.write('\n')
        L=1

score 0 · Accepted Answer

假设这些行应交替放入两个不同的文件中：

inputFile = file('CL-0.txt')
out = [ file(fileName, 'wb') for fileName in [ 'xx_0.txt', 'yy_0.txt' ] ]
done = set()
for line in inputFile:
  elements = re.match(r'\s*\(\s*([^,])*\s*,\s*([^)])*\)\s*', line)
  if elements in done:
      continue
  out[0].write(', '.join(elements) + '\n')
  done.add(elements)
  out = out[1:] + [ out[0] ]  # round robin
for f in out:
  f.close()

但我不明白这些xx1和yy1集合的目的是什么。你的代码肯定没有解释它（它根本没有写这些），你的文字也没有足够的帮助。也许你想详细说明一下？

python - 将一个文本文件分成两个具有不重叠条目的文件

1 回答 1

Related

Reference