2

现在我用下面的代码解析一些txt文件中的段落文本:

def ParseFile(path,filename):


    content=open(path+filename).read()
    code=filename.split('.')[0]

    pattenstart = ''
    pattenend = ''


    for catlog in CATLOG:   

        i = content.index(pattenstart)
        j = content.index(pattenend)

        info=content[i:j]

        yield (catlog,code,info)
        sys.stdout.write('.')

并且信息是多行文本

现在我想输出一个 csv 文件,如:

code    info
***     ****
        ****
        ****
***     ****
        ****
        ****

我使用一些脚本进行测试,但只能输出如下文件:

code    info 
***     ****
***********
**********

我的测试脚本是:

time1=time.time()

subfix='_ALL.csv'
d = defaultdict(list)
for path in [PATH1,PATH2]:
    print 'Parsing',path
    filenames = os.listdir(path)
    for filename in filenames:
        print 'Parsing',filename
        for item in ParseFile(path,filename):
            d[item[0]].append((item[1],item[2]))
        print

for k in d.keys():
    out_file=open(DESTFILEPATH+k+subfix,'w')
    for code,info in sorted(set(d[k])):
        out_file.write(code+'\t'+info+\n')
    out_file.close()
print 'Done in %0.1f seconds'%(time.time()-time1)

如何解决?

4

1 回答 1

3

Python 有模块csv它可以让你更轻松地做你想做的事,我建议你看看。

例如:

import csv
with open('somefile.csv', 'w') as file:
    output = csv.writer(file, delimiter='\t')
    output.writerows([
        ['code', 'info'],
        ['****', '****'],
        [None, '****'],
        [None, '****'],
        [None, '****'],
        ['****', '****'],
        [None, '****']
    ])

产生:

code    info
****    ****
        ****
        ****
        ****
****    ****
        ****

编辑:

如果您的数据不适合此格式,那么您只需将其更改为适合:

import csv
from itertools import izip_longest
from itertools import chain

data = [("key", ["value", "value"]), ("key", ["value", "value"])]

with open('somefile.csv', 'w') as file:
    output = csv.writer(file, dialect='excel-tab')
    output.writerows(
        chain.from_iterable(
            izip_longest([key], values) for key, values in data
        )
    )

产生:

key     value
        value
key     value
        value
于 2012-04-26T12:28:58.573 回答