0

所以我需要提取一些客户的详细信息并将其保存在一个新的数据库中,我只有一个 txt 文件,所以我们谈论的是 5000 个客户或更多的 txt 文件,它以这种方式保存:

first and last name   
NAME SURNAME            
zip country n. phone number mobile
United Kingdom      +1111111111
e-mail
email@email.email
guest first and last name 1°
NAME SURNAME
guest first and last name 2°
NAME SURNAME
name    address city    province
NAME SURNAME    London  London  
zip
AAAAA
Cancellation of the reservation.

所以我因为文件总是这样我在想可能有一种方法可以刮掉所以我做了一些研究,这是我想出的,但不是我真正需要的:

with open('input.txt') as infile, open('output.txt', 'w') as outfile:
copy = False
for line in infile:
    if (line.find("first and last name") != -1):
        copy = True
    elif (line.find("Cancellation of the reservation.") != -1):
        copy = False
    elif copy:
        outfile.write(line)

这些代码有效,但只是从一行读取文件并复制我需要的内容,这些内容将以其他格式复制内容,例如我可以上传到数据库中,我需要的格式是这样的:

first and last name | zip country n. phone number mobile|e-mail|guest first and last name 1°|name    address city    province|zip

所以在这种情况下,我需要这样:

NAME SURNAME | United Kingdom      +1111111111|email@email.email|NAME SURNAME   London  London  |AAAAA

对于 output.txt 中的每一行

你们认为创建这个很难吗?有人可以帮助我吗?任何建议都会有帮助

4

1 回答 1

0

这些是您想要做的一些很好的抓取工具:

data = '''first and last name   
        NAME SURNAME            
        zip country n. phone number mobile
        United Kingdom      +1111111111
        e-mail
        email@email.email
        guest first and last name 1
        NAME SURNAME
        guest first and last name 2
        NAME SURNAME
        name    address city    province
        NAME SURNAME    London  London  
        zip
        AAAAA
        Cancellation of the reservation.
        '''
# split on space, convert to list
ldata = data.split()
# strip leading and trailing white space from each item
ldata = [i.strip() for i in ldata]
# split on line break, convert to list
ndata = data.split('\n')
ndata = [i.strip() for i in ndata]
#convert list to string   
sdata = ' '.join(ldata)

print ldata
print ndata
print sdata

# two examples of split after, split before
name_surname = sdata.split('first and last name')[1].split('zip')[0]
print name_surname

country_phone = sdata.split('mobile')[1].split('e-mail')[0]
print country_phone

>>>

['first', 'and', 'last', 'name', 'NAME', 'SURNAME', 'zip', 'country', 'n.', 'phone', 'number', 'mobile', 'United', 'Kingdom', '+1111111111', 'e-mail', 'email@email.email', 'guest', 'first', 'and', 'last', 'name', '1', 'NAME', 'SURNAME', 'guest', 'first', 'and', 'last', 'name', '2', 'NAME', 'SURNAME', 'name', 'address', 'city', 'province', 'NAME', 'SURNAME', 'London', 'London', 'zip', 'AAAAA', 'Cancellation', 'of', 'the', 'reservation.']
['first and last name', 'NAME SURNAME', 'zip country n. phone number mobile', 'United Kingdom      +1111111111', 'e-mail', 'email@email.email', 'guest first and last name 1', 'NAME SURNAME', 'guest first and last name 2', 'NAME SURNAME', 'name    address city    province', 'NAME SURNAME    London  London', 'zip', 'AAAAA', 'Cancellation of the reservation.', '']
first and last name NAME SURNAME zip country n. phone number mobile United Kingdom +1111111111 e-mail email@email.email guest first and last name 1 NAME SURNAME guest first and last name 2 NAME SURNAME name address city province NAME SURNAME London London zip AAAAA Cancellation of the reservation.
 NAME SURNAME 
 United Kingdom +1111111111 
于 2017-04-14T03:31:00.763 回答