现在我用下面的代码解析一些txt文件中的段落文本:
def ParseFile(path,filename):
content=open(path+filename).read()
code=filename.split('.')[0]
pattenstart = ''
pattenend = ''
for catlog in CATLOG:
i = content.index(pattenstart)
j = content.index(pattenend)
info=content[i:j]
yield (catlog,code,info)
sys.stdout.write('.')
并且信息是多行文本
现在我想输出一个 csv 文件,如:
code info
*** ****
****
****
*** ****
****
****
我使用一些脚本进行测试,但只能输出如下文件:
code info
*** ****
***********
**********
我的测试脚本是:
time1=time.time()
subfix='_ALL.csv'
d = defaultdict(list)
for path in [PATH1,PATH2]:
print 'Parsing',path
filenames = os.listdir(path)
for filename in filenames:
print 'Parsing',filename
for item in ParseFile(path,filename):
d[item[0]].append((item[1],item[2]))
print
for k in d.keys():
out_file=open(DESTFILEPATH+k+subfix,'w')
for code,info in sorted(set(d[k])):
out_file.write(code+'\t'+info+\n')
out_file.close()
print 'Done in %0.1f seconds'%(time.time()-time1)
如何解决?