0

这个正则表达式应该找到一个准确地找到这种格式的字符串:

201308 - (82608) - MAC 2233-007-Methods of Calculus - Lastname, Lee.txt

唯一需要注意的是最后一个连字符和 .txt 之间的最后一部分,以及之前的课程名称,都可以是可变数量的字母(讲师姓名和课程名称)。所有其他都具有该格式的字符数(整数由许多空格和连字符精确分隔,或者具有所有大写字母的精确课程前缀)。

正则表达式实际上在做的是什么也没找到。没有试图逃避括号,它正在捕获一些文件,但现在 nada。我使用re.search而不是re.match因为显然正则表达式还没有完成,我正在测试它的一部分。

import re, os, sys, shutil

def readDir(path1):
    return [ f for f in os.listdir(path1) if os.path.isfile(os.path.join(path1,f)) ]

def files(dir1,term,path1):
    match2 = []; stillWrong = []#; term = str(term)
    for f in dir1:
        result = re.search(term + "\s\b\s\(\d{5}\)\s\b\s\w{3}\s\d{4}\b\d{3}[a-z\A-Z]+\s\b\s[A-z\a-z]+\b\s[A-Z\a-z]+ .txt",f)
        if result: match2.append(f)
        else: stillWrong.append(f)
        #print "split --- ",os.path.split(f)
        ##else: os.rename(path1+'\\'+f, path1+'\\'+'@ '+f); stillWrong.append(f)
        print "f ---- ",f
    return match2, stillWrong

term = "201308"; src = "testdir1"; dest = "testdir2"

print files(readDir(dest),term,dest)

这会产生(显然)错误:

    >>> 
f ----  @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ----  @ @ @ @ @ @ 201308 abc 123.txt
f ----  @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ----  @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ----  @ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ----  @ @ @ @ @ @ @ @ @ 201308 abc 123.txt
f ----  @ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ----  @ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ----  @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
([], ['@ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt'])
>>> 

如您所见,match2[]列表中没有任何内容(如果您感兴趣,这些是第二个列表中的文件名,但第一个列表包含相关匹配项)。我正在自学 Python 和正则表达式,但进展并不顺利。我已经尝试过这些(和正则表达式教程),但在这种情况下似乎没有帮助:

在 Python 中转义正则表达式字符串

正则表达式转义括号

如何在 python 正则表达式中实现 \p{L}

所有这些@都来自os.rename您看到的已注释掉的内容,但无论如何在被注释之前它都不起作用。我敢肯定,任何入门级程序员都可以在几分钟内完成这项工作,但如果专业人士遇到这个问题并且愿意花一分钟时间,那也很棒。

编辑:使用的文件名列表(生产列表显然更长):

201308-(12345) - Abc 2233-007-course Name - last, first.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
@ @ @ @ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
45-12 - xyz - mno - 123-pqr-tuv-456.txt
123 abc - a-1 - b-2.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
201308 abc 123.txt
201308-(12345) - Abc 2233-007-course Name - last, first.txt
4

2 回答 2

2

有些事情对我来说似乎很奇怪:

  • \s\b\s是异常的,因为\b表示“匹配空字符串,但仅在单词的开头或结尾”,但在这里它位于两个符号之间,表示空格,也就是说不在单词的开头或结尾。

  • 中的反斜杠[A-z\a-z]会引发错误。我想知道这里应该是什么意思。你想要一个反斜杠作为 sett 的可能字符吗?然后写[A-z\\\\a-z]

此正则表达式与您的示例字符串匹配:

r = re.compile(term +
               ("\s-\s"
                "\(\d{5}\)"
                "\s-\s"
                "\w{3}\s\d{4}-\d{3}-"
                "[a-zA-Z ]+"
                "\s-\s"
                "[A-za-z]+,\s"
                "[A-Za-z]+ *.txt"))
于 2013-09-01T18:44:44.420 回答
1

\d{6}\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt匹配您作为示例发送的字符串。如果初始值未知,term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt'应该这样做(前提term是对正则表达式很好)。

添加测试运行示例:

>>> term = '201308'
>>> f = '201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'

完后还有:

>>> f = '/somefolder/somefolder2/201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'

>>> f = 'c:\\somefolder\\somefolder2\\201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
于 2013-09-01T18:22:48.373 回答