2

请帮忙,我有一个看起来像这样的文本文件:

ID: 000001
Name: John Smith
Email: jsmith@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID: 000002
Name: Jane Doe
Email: jdoe@ibm.com
Company: IBM
blah1: a
blah2: b
blah3: c
ID:000003
.
.
.
etc.

请注意,每个客户的信息在 7 行中。ID:000002 标志着下一个客户的开始,000003 标志着下一个客户的开始,依此类推。

我希望我的输出文件是这样的(而不是下一行中的每个客户的数据,将每个 ID 和后续 7 行转置为列):

ID: 000001,Name: John Smith,Email: jsmith@ibm.com,Company: IBM, blah1: a,blah2: b,blah3: c
ID: 000002,Name: Jane Doe,Email: jdoe@ibm.com,Company: IBM,blah1: a,blah2: b,blah3: c

我不确定这是否是最简单的技术,我尝试使用 list 但这似乎不适用于我的目的。我知道我的代码并不优雅,但这只是为了让我自己和另一个人自动化一些数据操作。我真的不需要任何时尚的东西,只要它有效。

#!/usr/bin/python
# open file
input = open ("C:\Documents\Customer.csv","r")

#write to a new file
output = open("C:\Documents\Customer1.csv","w")

#Read whole file into data
data = input.readlines()
list = []
for line in data:
if "User Id:" in line:
    list.append(line)
if "User Email:" in line:
    list.append(line)
if "Company:" in line:
    list.append(line)   
if "Contact Id:" in line:
    list.append(line)
if "Contact Name:" in line:
    list.append(line)
if "Contact Email:" in line:
    list.append(line)
    print list
    import os
    output.write("\n".join(list))
# Close the file
input.close()
output.close()

我的输出文件包含转义字符,并且一些客户被多次添加。

4

3 回答 3

0

Why does your code and input file differ? You have "ID:" vs "User Id:", "Email" vs "User Email:", etc..? Well anyways, you can do like this:

#!/usr/bin/python

# open file
input = open ("C:\Documents\Customer.csv","r")

#write to a new file
output = open("C:\Documents\Customer1.csv","w")

lines = [line.replace('\n',',') for line in input.split('ID:')]
output.write("\nID:".join(lines)[1:])

# Close files
input.close()
output.close()

Or, if you totally want to filter for specific fields in case something else pops in, like this:

#!/usr/bin/python

#import regex module
import re

# open input file
input = open ("C:\Documents\Customer.csv","r")

#open output file
output = open("C:\Documents\Customer1.csv","w")

#create search string
search = re.compile(r"""
                        ID:\s\d+|
                        Name:\s\w+\s\w+|
                        Email:\s\w+\@\w+\.\w+|
                        Company:\s\w+|
                        blah1:\s\w+|
                        blah2:\s\w+|
                        blah3:\s\w+
                        """, re.X)

#write to output joining parts with ',' and adding Newline before IDs
output.write(",".join(search.findall(input.read())).replace(',ID:','\nID:'))

# Close files
input.close()
output.close()

Take a note, in the last example it doesn't have to have 7 fields per person :)

And now with duplicates removed (order is not kept, and complete record is compared):

#!/usr/bin/python

#import regex module
import re

# open input file
input = open ("C:\Documents\Customer.csv","r")

#open output file
output = open("C:\Documents\Customer1.csv","w")

#create search string
search = re.compile(r"""
                        ID:\s\d+|
                        Name:\s\w+\s\w+|
                        Email:\s\w+\@\w+\.\w+|
                        Company:\s\w+|
                        blah1:\s\w+|
                        blah2:\s\w+|
                        blah3:\s\w+
                        """, re.X)

# create data joining parts with ',' and adding Newline before IDs    
data = ",".join(search.findall(input.read())).replace(',ID:','\nID:')

# split data into list 
# removing duplicates out of strings with set() and joining result back
# together for the output

output.write("\n".join(set(data.split('\n'))))

# Close files
input.close()
output.close()
于 2013-05-24T01:57:16.180 回答
0

想想你想要完成什么,它真的很简单。

您有一个巨大的清单,每条分为 7 行

首先,我会把所有东西都变成一个巨大的清单,就像你已经做过的那样

data = input.readlines()

数一数

totalUsers = len(data)/7 # it SHOULD be divisible by 7

这为您提供了遍历所有内容所需的迭代次数。现在是时候开始切片了

users = []
start = 0 #because we start on 0
end = 6 # and end on 6 ( which is the 7th line )
for number in totalUsers:
    person = totalUsers[start:end]   # slicing, learn about it, its cool stuff
    start += 7       # move start up 7
    end +=7           # move end up 7
    users.append(person)
于 2013-05-24T00:02:09.453 回答
0
....
data = input.read()  #read it all in
people = [person.replace("\n","") for person in data.split("ID:")]
data_new = "\nID:".join(people)

output.write(data_new.strip())

首先读入你的整个文件作为一个大块

然后在“ID:”上拆分您的数据,以便您有一个列表

为每个项目替换换行符

将您的“人员”列表与“\nID:”一起重新加入,以获得一大块文本

把它写回你的输出(strip这样你就可以摆脱任何额外的前导\n

于 2013-05-23T23:52:09.553 回答