4

我想将以下数据存储在数据字典中,以便可以轻松地将其导出到 CSV 文件。问题是每个学校 ID 的列并不总是相同的顺序:

text = """
school id= 28392
name|year|degree|age|race
Susan A. Smith|2007|PhD|27|white
Fred Collins|2006|PhD|26|hispanic
Amber Real|2007|MBA|28|white
Mike Lee|2003|PhD|27|white

school id= 273533123
name|year|age|race|degree
John B. Black|2003|27|hispanic|MBA
Steven Smith|2005|28|black|PhD
Jacob Waters|2003|25|hispanic|MBA

school id = 3452332
name|year|race|age|degree
Peter Hintze|2002|white|27|Bachelors
Ann Graden|2004|black|25|MBA
Bryan Stewart|2004|white|28|PhD
"""

我希望最终能够将所有数据输出到具有以下标题的 CSV 文件:

school id|year|name|age|race|degree

我可以在 Python 中做到这一点吗?

4

1 回答 1

6

这实际上看起来很容易。将文件处理为数据结构,然后将其导出为 csv。

school = None
headers = None
data = {}
for line in text.splitlines():
    if line.startswith("school id"):
        school = line.split('=')[1].strip()
        headers = None
        continue
    if school is not None and headers is None:
        headers = line.split('|')
        continue

    if school is not None and headers is not None and line:
        if not school in data:
            data[school] = []
        datum = dict(zip(headers, line.split('|')))
        data[school].append(datum)    

In [29]: data
Out[29]: 
{'273533123': [{'age': '27',
                'degree': 'MBA',
                'name': 'John B. Black',
                'race': 'hispanic',
                'year': '2003'},
               {'age': '28',
                'degree': 'PhD',
                'name': 'Steven Smith',
                'race': 'black',
                'year': '2005'},
               {'age': '25',
                'degree': 'MBA',
                'name': 'Jacob Waters',
                'race': 'hispanic',
                'year': '2003'}],
 '28392': [{'age': '27',
            'degree': 'PhD',
            'name': 'Susan A. Smith',
            'race': 'white',
            'year': '2007'},
           {'age': '26',
            'degree': 'PhD',
            'name': 'Fred Collins',
            'race': 'hispanic',
            'year': '2006'},
           {'age': '28',
            'degree': 'MBA',
            'name': 'Amber Real',
            'race': 'white',
            'year': '2007'},
           {'age': '27',
            'degree': 'PhD',
            'name': 'Mike Lee',
            'race': 'white',
            'year': '2003'}],
 '3452332': [{'age': '27',
              'degree': 'Bachelors',
              'name': 'Peter Hintze',
              'race': 'white',
              'year': '2002'},
             {'age': '25',
              'degree': 'MBA',
              'name': 'Ann Graden',
              'race': 'black',
              'year': '2004'},
             {'age': '28',
              'degree': 'PhD',
              'name': 'Bryan Stewart',
              'race': 'white',
              'year': '2004'}]}    
于 2011-06-06T14:53:24.240 回答