python - Python - 正则表达式没有转义字母数字字符串中的括号，其中包含符号和传递给方法的变量

Question

这个正则表达式应该找到一个准确地找到这种格式的字符串：

201308 - (82608) - MAC 2233-007-Methods of Calculus - Lastname, Lee.txt

唯一需要注意的是最后一个连字符和 .txt 之间的最后一部分，以及之前的课程名称，都可以是可变数量的字母（讲师姓名和课程名称）。所有其他都具有该格式的字符数（整数由许多空格和连字符精确分隔，或者具有所有大写字母的精确课程前缀）。

正则表达式实际上在做的是什么也没找到。没有试图逃避括号，它正在捕获一些文件，但现在 nada。我使用re.search而不是re.match因为显然正则表达式还没有完成，我正在测试它的一部分。

import re, os, sys, shutil

def readDir(path1):
    return [ f for f in os.listdir(path1) if os.path.isfile(os.path.join(path1,f)) ]

def files(dir1,term,path1):
    match2 = []; stillWrong = []#; term = str(term)
    for f in dir1:
        result = re.search(term + "\s\b\s\(\d{5}\)\s\b\s\w{3}\s\d{4}\b\d{3}[a-z\A-Z]+\s\b\s[A-z\a-z]+\b\s[A-Z\a-z]+ .txt",f)
        if result: match2.append(f)
        else: stillWrong.append(f)
        #print "split --- ",os.path.split(f)
        ##else: os.rename(path1+'\\'+f, path1+'\\'+'@ '+f); stillWrong.append(f)
        print "f ---- ",f
    return match2, stillWrong

term = "201308"; src = "testdir1"; dest = "testdir2"

print files(readDir(dest),term,dest)

这会产生（显然）错误：

    >>> 
f ----  @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ----  @ @ @ @ @ @ 201308 abc 123.txt
f ----  @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ----  @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ----  @ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ----  @ @ @ @ @ @ @ @ @ 201308 abc 123.txt
f ----  @ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ----  @ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ----  @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
f ----  @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
([], ['@ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt'])
>>>

如您所见，match2[]列表中没有任何内容（如果您感兴趣，这些是第二个列表中的文件名，但第一个列表包含相关匹配项）。我正在自学 Python 和正则表达式，但进展并不顺利。我已经尝试过这些（和正则表达式教程），但在这种情况下似乎没有帮助：

在 Python 中转义正则表达式字符串

正则表达式转义括号

如何在 python 正则表达式中实现 \p{L}

所有这些@都来自os.rename您看到的已注释掉的内容，但无论如何在被注释之前它都不起作用。我敢肯定，任何入门级程序员都可以在几分钟内完成这项工作，但如果专业人士遇到这个问题并且愿意花一分钟时间，那也很棒。

编辑：使用的文件名列表（生产列表显然更长）：

201308-(12345) - Abc 2233-007-course Name - last, first.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
@ @ @ @ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
45-12 - xyz - mno - 123-pqr-tuv-456.txt
123 abc - a-1 - b-2.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
201308 abc 123.txt
201308-(12345) - Abc 2233-007-course Name - last, first.txt

score 2 · Accepted Answer

有些事情对我来说似乎很奇怪：

\s\b\s是异常的，因为\b表示“匹配空字符串，但仅在单词的开头或结尾”，但在这里它位于两个符号之间，表示空格，也就是说不在单词的开头或结尾。
中的反斜杠[A-z\a-z]会引发错误。我想知道这里应该是什么意思。你想要一个反斜杠作为 sett 的可能字符吗？然后写[A-z\\\\a-z]

此正则表达式与您的示例字符串匹配：

r = re.compile(term +
               ("\s-\s"
                "\(\d{5}\)"
                "\s-\s"
                "\w{3}\s\d{4}-\d{3}-"
                "[a-zA-Z ]+"
                "\s-\s"
                "[A-za-z]+,\s"
                "[A-Za-z]+ *.txt"))

score 1 · Accepted Answer

\d{6}\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt匹配您作为示例发送的字符串。如果初始值未知，term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt'应该这样做（前提term是对正则表达式很好）。

添加测试运行示例：

>>> term = '201308'
>>> f = '201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'

完后还有：

>>> f = '/somefolder/somefolder2/201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'

>>> f = 'c:\\somefolder\\somefolder2\\201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'
>>> re.search(term + '\s-\s\(\d{5}\)\s-\s\w{3}\s\d{4}-\d{3}-[^\.]+\.txt', f).group(0)
'201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt'

python - Python - 正则表达式没有转义字母数字字符串中的括号，其中包含符号和传递给方法的变量

2 回答 2

Related

Reference