这个正则表达式应该找到一个准确地找到这种格式的字符串:
201308 - (82608) - MAC 2233-007-Methods of Calculus - Lastname, Lee.txt
唯一需要注意的是最后一个连字符和 .txt 之间的最后一部分,以及之前的课程名称,都可以是可变数量的字母(讲师姓名和课程名称)。所有其他都具有该格式的字符数(整数由许多空格和连字符精确分隔,或者具有所有大写字母的精确课程前缀)。
正则表达式实际上在做的是什么也没找到。没有试图逃避括号,它正在捕获一些文件,但现在 nada。我使用re.search
而不是re.match
因为显然正则表达式还没有完成,我正在测试它的一部分。
import re, os, sys, shutil
def readDir(path1):
return [ f for f in os.listdir(path1) if os.path.isfile(os.path.join(path1,f)) ]
def files(dir1,term,path1):
match2 = []; stillWrong = []#; term = str(term)
for f in dir1:
result = re.search(term + "\s\b\s\(\d{5}\)\s\b\s\w{3}\s\d{4}\b\d{3}[a-z\A-Z]+\s\b\s[A-z\a-z]+\b\s[A-Z\a-z]+ .txt",f)
if result: match2.append(f)
else: stillWrong.append(f)
#print "split --- ",os.path.split(f)
##else: os.rename(path1+'\\'+f, path1+'\\'+'@ '+f); stillWrong.append(f)
print "f ---- ",f
return match2, stillWrong
term = "201308"; src = "testdir1"; dest = "testdir2"
print files(readDir(dest),term,dest)
这会产生(显然)错误:
>>>
f ---- @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ---- @ @ @ @ @ @ 201308 abc 123.txt
f ---- @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ---- @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ---- @ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ---- @ @ @ @ @ @ @ @ @ 201308 abc 123.txt
f ---- @ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ---- @ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ---- @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
([], ['@ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt'])
>>>
如您所见,match2[]
列表中没有任何内容(如果您感兴趣,这些是第二个列表中的文件名,但第一个列表包含相关匹配项)。我正在自学 Python 和正则表达式,但进展并不顺利。我已经尝试过这些(和正则表达式教程),但在这种情况下似乎没有帮助:
所有这些@
都来自os.rename
您看到的已注释掉的内容,但无论如何在被注释之前它都不起作用。我敢肯定,任何入门级程序员都可以在几分钟内完成这项工作,但如果专业人士遇到这个问题并且愿意花一分钟时间,那也很棒。
编辑:使用的文件名列表(生产列表显然更长):
201308-(12345) - Abc 2233-007-course Name - last, first.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
@ @ @ @ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
45-12 - xyz - mno - 123-pqr-tuv-456.txt
123 abc - a-1 - b-2.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
201308 abc 123.txt
201308-(12345) - Abc 2233-007-course Name - last, first.txt