tickettypepat = (r'MIS Notes:.*(//p//)?.*')
retype = re.search(tickettypepat,line)
if retype:
  print retype.group(0)
  print retype.group(1)


MIS Notes: //p//

谁能告诉我为什么 group(0) 是

MIS Notes: //p// 

并且 group(1) 返回为 None?

我最初使用的是正则表达式,因为在遇到问题之前,匹配比仅匹配 //p// 更复杂,这是完整的代码。我在这方面还很陌生,所以请原谅我的菜鸟,我相信有更好的方法可以完成大部分工作,如果有人想指出那些会很棒的方法。但除了 //[pewPEW]// 的正则表达式过于贪婪之外,它似乎是功能性的。我很感激帮助。


filename = (r'.\4-12_4-26.txt')
import re
import sys
#Clean up output from the web to ensure that you have one catagory per line
f = open(filename)
w = open('cleantext.txt','w')

origdatepat = (r'(Ticket Date: )([0-9]+/[0-9]+/[0-9]+),( [0-9]+:[0-9]+ [PA]M)')
tickettypepat = (r'MIS Notes:.*(//[pewPEW]//)?.*')

print 'Begining Blank Line Removal'
for line in f:
    redate = re.search(origdatepat,line)
    retype = re.search(tickettypepat,line)
    if line == ' \n':
        line = ''
        print 'Removing blank Line'
#remove ',' from time and date line    
    elif redate:
        line = redate.group(1) + redate.group(2)+ redate.group(3)+'\n'
        print 'Redating... ' + line

    elif retype:
        print retype.group(0)
        print retype.group(1)
        if retype.group(1) == '//p//':
            line = line + 'Type: Phone\n'
            print 'Setting type for... ' + line
        elif retype.group(1) == '//e//':
            line = line + 'Type: Email\n'
            print 'Setting type for... ' + line
        elif retype.group(1) == '//w//':
            line = line + 'Type: Walk-in\n'
            print 'Setting type for... ' + line
        elif retype.group(1) == ('' or None):
            line = line + 'Type: Ticket\n'
            print 'Setting type for... ' + line


print 'Closing Files'                 


Ticket No.: 20100426132 
Ticket Date: 04/26/10, 10:22 AM 
Close Date:  
Primary User: XXX
Branch: XXX
Help Tech: XXX
Status: Pending  
Priority: Medium  
Application: xxx
Description: some issue
Resolution: some resolution
MIS Notes: some random stuff //p// followed by more stuff
Key Words:  

MIS Notes:.*(//p//)?.*像这样工作,在"MIS Notes: //p//"作为目标的例子中:

  1. MIS Notes:比赛"MIS Notes:",这里没有惊喜。
  2. .*立即运行到字符串的末尾(匹配到目前为止"MIS Notes: //p//"
  3. (//p//)? is optional. Nothing happens.
  4. .* has nothing left to match, we are at the end of the string already. Since the star allows zero matches for the preceding atom, the regex engine stops reporting the entire string as a match, and the sub-group as empty.

Now when you change the regex to MIS Notes:.*(//p//).*, the behavior changes:

  1. MIS Notes: matches "MIS Notes:", still no surprises here.
  2. .* immediately runs to the end of the string (match so far "MIS Notes: //p//")
  3. (//p//) is necessary. The engine starts to backtrack character by character in order to fulfill this requirement. (Match so far "MIS Notes: ")
  4. (//p//) can match. Sub-group one is saved and contains "//p//".
  5. .* runs to the end of the string. Hint: If you are not interested in what it matches, it is superfluous and you can remove it.

Now when you change the regex to MIS Notes:.*?//(p)//, the behavior changes again:

  1. MIS Notes: matches "MIS Notes:", and still no surprises here.
  2. .*? is non-greedy and checks the following atom before it proceeds (match so far "MIS Notes: ")
  3. //(p)// can match. Sub-group one is saved and contains "p".
  4. Done. Note that no backtracking occurs, this saves time.

Now if you know that there can be no / before the //p//, you can use: MIS Notes:[^/]*//(p)//:

  1. MIS Notes: matches "MIS Notes:", you get the idea.
  2. [^/]* can fast-forward to the first slash (this is faster than .*?)
  3. //(p)// can match. Sub-group one is saved and contains "p".
  4. Done. Note that no backtracking occurs, this saves time. This should be faster than version #3.
if line.startswith('MIS Notes:'): # starts with that string
    data = line[len('MIS Notes:'):] # the rest in the interesting part
    if '//p//' in data:
        stuff, sep, rest = data.partition('//p//') # or sothing like that
        pass #other stuff
import re
lines = ['MIS Notes: //p//',
    'MIS Notes: prefix//p//suffix']

tickettypepat = (r'MIS Notes: (?:(.*)//p//)?(.*)')
for line in lines:
    m = re.search(tickettypepat,line)
    print 'line:', line
    if m: print 'groups:', m.groups()
    else: print 'groups:', m


line: MIS Notes: //p//
groups: ('', '')
line: MIS Notes: prefix//p//suffix
groups: ('prefix', 'suffix')
