-2

我有 2 个文本文件:

1) 城市.txt

San Francisco
Los Angeles
Seattle
Dallas

2)master.txt

Atlanta is chill and laid-back.
I love Los Angeles.
Coming to Dallas was the right choice.
New York is so busy!
San Francisco is fun.
Moving to Boston soon!
Go to Seattle in the summer.

试图获取 output.txt

<main><beg>I love</beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to</beg><key>Dallas</key><end>was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end>is fun</end></main>
<main><beg>Go to</beg><key>Seattle</key><end>in the summer</end></main>

city.txt 中的每个实体都是 <key>。master.txt 文件要长得多,所有没有特定城市的行都应该被忽略。他们不按顺序。输出打印出 <key> 和 <beg> & <end> 上下文中的城市(如果有)。

这就是我所拥有的:

with open(master.txt) as f:
    master = f.read()
working = []
with open(cities.txt) as f:
    for i in (word.strip() for word in f):
        if i in master:
            print "<key>", i, "</key>"

我知道如何检查两个文本文件(在 'master' 中找到 'city')......但是一旦我找到城市,我就卡在了如何在 master.txt 中打印和上下文的部分!

4

2 回答 2

1

以下应该可以帮助您实现您想要的。这适用于 Python2 和 Python3。

#!/usr/bin/python

import os

def parse(line, city):
    start = line.find(city)
    end = start + len(city)
    # Following is a simple implementation. I haven't parsed for spaces
    # and punctuations around tags.
    return '<main><beg>' + line[:start] + '</beg><key>' + city + '</key><end>' \
           + line[end:] + '</end></main>'

master = [line.strip() for line in open(os.getcwd() + '/master.txt', 'r')]
cities = [line.strip() for line in open(os.getcwd() + '/cities.txt', 'r')]
data = []

for line in master:
    for city in cities:
        if city in line:
            data.append(parse(line, city))

# Following would overwrite output.txt file in the current working directory
with open(os.getcwd() + '/output.txt', 'w') as foo:
    for output in data:
        foo.write(output + '\n')
于 2013-02-04T22:09:02.657 回答
1

这也应该有效,使用 python 2.6 进行测试:

cities_dict = {}
with open('master.txt', 'r') as master_in:
    with open('cities.txt') as city_in:
        for city in city_in:
            cities_dict[city.strip()] = '</beg><key>'+city.strip()+'</key><end>'

    for line in master_in:
        for key,val in cities_dict.iteritems():
            if key in line:
                line_out= '<main><beg>'+line.replace(key,val).replace('!','.').replace('.','').strip('\n')+'</end></main>'
                print line_out

输出:

<main><beg>I love </beg><key>Los Angeles</key><end></end></main>
<main><beg>Coming to </beg><key>Dallas</key><end> was the right choice</end></main>
<main><beg></beg><key>San Francisco</key><end> is fun</end></main>
<main><beg>Go to </beg><key>Seattle</key><end> in the summer</end></main>
于 2013-02-04T23:17:48.403 回答