1

是)我有的

我正在解析一个 .txt 文件,其中包含在给定日期工作的人员的日程安排信息。.txt 文件如下所示:

START PAGE 0

XYZ Schedule for:  Saturday, March 30, 2013

Barnes, Michael8:00a10:00aTech

Collins, Jessica8:00a4:00pSupervisor

Hamilton, Patricia8:00a10:00aTech

Smith, Jan8:00a10:00aTech

Park, Kimberly8:00a10:00aTech

Edwards, Terrell10:00a12:00pTech

Green, Harrold12:00p2:00pTech

Tait, Jessica12:00p2:00pTech

Tait, Jessica2:00p4:00pTech

Hernandez, William (Monte)4:00p6:30pSupervisor

Tait, Chioma4:00p6:00pTech

Hernandez, William (Monte)6:30p7:00pSupervisor

Hernandez, William (Monte)7:00p9:00pSupervisor

Tailor, Thomas (Jason)9:00p12:00aSupervisor

Jones, Deslynne10:00p12:00aTech

3/28/2013 2:21:17 PM

END PAGE 0

所以前两行和最后两行不相关,但中间的每一行都是一个人的时间表。

我想要的是

我想解析出每一行的片段,以便可以将其写入 .csv 文件。我可以line.partition(',')[0]用来获取姓氏(每行的第一部分),但之后我不知所措。我需要将以下内容传达给 Python:

  1. 数字后面的部分,是一个部分(名字)
  2. 从第一个数字到 ana或 a p (上午或下午)的部分是另一个部分(开始时间)
  3. 从数字之后的部分ap到下一个a或是p另一个部分(结束时间)
  4. 最后,剩下的部分是另一个部分(班次的类型/位置。)

我生成的 csv 文件中的一行可能如下所示: Barnes,Michael,8:00a,10:00a,Tech

注意事项

1) 一个人一天可以有多个班次。2)有些人在括号中有昵称,但有些人没有。3)如果 Python 有通配符,比如#数字和*任何东西,我可以看到我如何能够继续使用partition并继续拆分剩余的部分,如下所示:

for line in input:
    name = str(line.partition(',')[0]+','+str(line.partition(',')[2].split(#)[0]))
    output.write("".join(x for x in name))
    output.write("\r\n")

但是,Python 似乎没有使用这样的通配符。此外,这似乎是一个非常不雅的解决方案。

4

2 回答 2

4

这应该足以让你开始:

import re
data = '''Barnes, Michael8:00a10:00aTech
Collins, Jessica8:00a4:00pSupervisor
Hamilton, Patricia8:00a10:00aTech
Smith, Jan8:00a10:00aTech
Park, Kimberly8:00a10:00aTech
Edwards, Terrell10:00a12:00pTech
Green, Harrold12:00p2:00pTech
Tait, Jessica12:00p2:00pTech
Tait, Jessica2:00p4:00pTech
Hernandez, William (Monte)4:00p6:30pSupervisor
Tait, Chioma4:00p6:00pTech
Hernandez, William (Monte)6:30p7:00pSupervisor
Hernandez, William (Monte)7:00p9:00pSupervisor
Tailor, Thomas (Jason)9:00p12:00aSupervisor
Jones, Deslynne10:00p12:00aTech'''

print re.findall(r'(.*?)(\d{1,2}:\d\d[ap])(\d{1,2}:\d\d[ap])(.*)', data)

印刷

[('Barnes, Michael', '8:00a', '10:00a', 'Tech'),
 ('Collins, Jessica', '8:00a', '4:00p', 'Supervisor'),
 ('Hamilton, Patricia', '8:00a', '10:00a', 'Tech'),
 ('Smith, Jan', '8:00a', '10:00a', 'Tech'),
 ('Park, Kimberly', '8:00a', '10:00a', 'Tech'),
 ('Edwards, Terrell', '10:00a', '12:00p', 'Tech'),
 ('Green, Harrold', '12:00p', '2:00p', 'Tech'),
 ('Tait, Jessica', '12:00p', '2:00p', 'Tech'),
 ('Tait, Jessica', '2:00p', '4:00p', 'Tech'),
 ('Hernandez, William (Monte)', '4:00p', '6:30p', 'Supervisor'),
 ('Tait, Chioma', '4:00p', '6:00p', 'Tech'),
 ('Hernandez, William (Monte)', '6:30p', '7:00p', 'Supervisor'),
 ('Hernandez, William (Monte)', '7:00p', '9:00p', 'Supervisor'),
 ('Tailor, Thomas (Jason)', '9:00p', '12:00a', 'Supervisor'),
 ('Jones, Deslynne', '10:00p', '12:00a', 'Tech')]

阅读re模块的文档以了解正则表达式。您可以将名称解析为单独的步骤,或者将正则表达式扩展为更具体。我建议使用该csv模块写入 csv 文件。

如果您遇到困难,请使用代码发布特定问题。

于 2013-03-29T17:47:49.890 回答
1

假设您知道如何删除前两行和最后两行,并且其余行位于名为 的字符串中s,那么我将按照您的要求执行以下操作:

entries = [x.strip() for x in s.split('\n') if x]

for entry in entries:
    ind = [i for i,x in enumerate(entry) if x.isdigit() and not entry[i-1].isdigit()]
    name = entry[0:ind[0]]
    name = name.split(',')

    other = entry[ind[0]:]
    ind = [-1]+[i for i,x in enumerate(other) if x in ('a', 'p') and other[i-1].isdigit()]
    shifts = []
    for i in xrange(1, len(ind)):
        shifts.append(other[ind[i-1]+1:ind[i]+1])
    position = other[ind[-1]+1:]
    print(name, shifts, position)

这将适用于任意数量的班次。

输出:

['Barnes', ' Michael'] ['8:00a', '10:00a'] Tech
['Collins', ' Jessica'] ['8:00a', '4:00p'] Supervisor
['Hamilton', ' Patricia'] ['8:00a', '10:00a'] Tech
['Smith', ' Jan'] ['8:00a', '10:00a'] Tech
['Park', ' Kimberly'] ['8:00a', '10:00a'] Tech
['Edwards', ' Terrell'] ['10:00a', '12:00p'] Tech
['Green', ' Harrold'] ['12:00p', '2:00p'] Tech
['Tait', ' Jessica'] ['12:00p', '2:00p'] Tech
['Tait', ' Jessica'] ['2:00p', '4:00p'] Tech
['Hernandez', ' William (Monte)'] ['4:00p', '6:30p'] Supervisor
['Tait', ' Chioma'] ['4:00p', '6:00p'] Tech
['Hernandez', ' William (Monte)'] ['6:30p', '7:00p'] Supervisor
['Hernandez', ' William (Monte)'] ['7:00p', '9:00p'] Supervisor
['Tailor', ' Thomas (Jason)'] ['9:00p', '12:00a'] Supervisor
['Jones', ' Deslynne'] ['10:00p', '12:00a'] Tech
于 2013-03-29T17:57:52.450 回答