python - 隔离其他地方包含的字符串

Question

我正在设置一个脚本来根据文件名中包含的文本合并 PDF。我这里的问题是“Violin I”也包含在“Violin II”中，“Alto Saxophone I”也包含在“Alto Saxophone II”中。我该如何设置，所以 tempList 将只包含来自“Violin I”的条目并排除“Violin II”，反之亦然？

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"]


# create arrays for each instrument that can be used for merging/organization
def organizer():
    for fileName in pdfList:
        for instrument in instruments:
            tempList = []
            if instrument in fileName:
                tempList.append(fileName)
        print tempList


print pdfList
organizer()

score 3 · Accepted Answer

避免包含子字符串的一种方法是使用正则表达式，例如：

import re

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \
II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\
one Saxophone"]

# create arrays for each instrument that can be used for merging/organization   
def organizer():
    for fileName in pdfList:
        tempList = []
        for instrument in instruments:
            if re.search(r'\b{}\b'.format(instrument), fileName):
                tempList.append(fileName)
        print tempList

print pdfList
organizer()

这将包含您的搜索词，\b以便它仅在开头和结尾在单词边界上时匹配。此外，也许显而易见但值得指出的是，这也会使您的乐器名称成为正则表达式的一部分，因此请注意，如果您使用的任何字符也是正则表达式元字符，它们将被解释为此类（现在您不是）。更通用的方案需要一些代码来查找和正确转义这些字符。

score 1 · Accepted Answer

1

尝试进行此更改：

...
if instrument+'.pdf' in fileName:
...

这会涵盖所有情况吗？

于 2013-03-23T16:25:50.630 回答

python - 隔离其他地方包含的字符串

2 回答 2

Related

Reference