python - Python比较两个字符串列表的相似性

Question

我是 Python 的新手，但我认为制作一个程序来对我的所有下载进行排序会很有趣，但我遇到了一些麻烦。如果我的目的地只有一个单词，它会完美运行，但如果目的地有两个或更多单词，这就是出错的地方，程序会陷入循环。有没有人比我有更好的想法来比较列表

>>>for i in dstdir:
>>>    print i.split()

['CALIFORNICATION']
['THAT', "'70S", 'SHOW']
['THE', 'BIG', 'BANG', 'THEORY']
['THE', 'OFFICE']
['DEXTER']
['SPAWN']
['SCRUBS']
['BETTER', 'OF', 'TED']

>>>for i in dstdir:
>>>    print i.split()
['Brooklyn.Nine-Nine.S01E16.REAL.HDTV.x264-EXCELLENCE.mp4']
['Revolution', '2012', 'S02E12', 'HDTV', 'x264-LOL[ettv]']]
['Inequality', 'for', 'All', '(2013)', '[1080p]']

这是列表输出的示例。

我有一个目标目录，其中只有文件夹和一个下载目录。我想制作一个程序来自动查看源文件名，然后查看目标名称。如果目标名称在源名称中，那么我可以继续并复制下载的文件，以便在我的收藏中对其进行排序。

destination = '/media/mediacenter/SAMSUNG/SERIES/'
source = '/home/mediacenter/Downloads/'
dstdir = os.listdir(destination)
srcdir = os.listdir(source)

for i in srcdir:
    source = list(i.split())
    for j in dstdir:
        count = 0
        succes = 0
        destination = list(j.split())
        if len(destination) == 1:
            while (count < len(source)):
                if destination[0].upper() == source[count].upper():
                    print 'succes ', destination, ' ', source
                count = count + 1
        elif len(destination) == 2:
            while(count < len(source)):
                if (destination[0].upper() == source[count].upper()):
                    succes = succes + 1
                    count = len(source)
            count = 0
            while(count < len(source)):
                if (destination[1].upper() == source[count].upper()):
                    succes = succes + 1
                    count = len(source)
            count = 0
            if succes == 2:
                print 'succes ', destination, ' ', source

现在我对只有“成功”作为输出感到满意。我会弄清楚如何复制文件，因为在不久的将来这对我来说将是一个完全不同的问题

score 2 · Accepted Answer

可能是这样的。检查目标文件夹中的每个单词是否存在于文件名中

dstdir = ['The Big Bang Theory', 'Dexter', 'Spawn' ]

srcdir = ['the.big.bang.theory s1e1', 'the.big.bang.theory s1e2', 'dexter s2e01']

for source in srcdir:
    for destination in dstdir:
        destinationWords = destination.split()

        if all(word.lower() in source.lower() for word in destinationWords):
            print 'succes ', destination, ' ', source

输出：

succes  The Big Bang Theory   the.big.bang.theory s1e1
succes  The Big Bang Theory   the.big.bang.theory s1e2
succes  Dexter   dexter s2e01

score 2 · Accepted Answer

我个人最喜欢在 python 中进行模糊字符串比较的是fuzzywuzzy，它有很多很好的例子和一个非常自由的许可证。

一些可能与您相关的示例：

> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
> process.extract("new york jets", choices, limit=2)
  [('New York Jets', 100), ('New York Giants', 78)]
> process.extractOne("cowboys", choices)
  ("Dallas Cowboys", 90)

或 token_sort_ratio 满足您的无序需求。

> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
  90
> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
  100

score 0 · Accepted Answer

使用下面推荐的这个简单脚本，您可以将文件从源移动到目标。

src = "/home/mediacenter/Downloads"
dst = "/media/mediacenter/SAMSUNG/SERIES"
source =  os.listdir(src)
destination = os.listdir(dst)

for filename in source:

    file_src = src +"/"+ str(filename)
    file_dst = dst +"/"+ str(filename)

    if filename not in destination and os.path.isdir(file_src) is False:
        #download file
        os.system("mv %s %s" %(file_src, file_dst))
    elif filename not in destination and os.path.isdir(file_src) is True:
        #download directory
        os.system("mv %s %s" %(file_src, dst))

看来您正在寻找什么。您只需要检查文件名是否不在目标列表中并移动它。它对你有用吗？

score 0 · Accepted Answer

从先前的答案中找到re.sub了解决问题的可能方法。替换此块：

# ...
import re

source =  os.listdir(src)
destination = os.listdir(dst)

经过

source =  [re.sub(' ', '\\\\ ',w)for w in os.listdir(src)]
destination = [re.sub(' ', '\\\\ ', w) for w in os.listdir(dst)]

它可以在名称之间移动带有空格的文件夹。

我认为您应该寻找正则表达式，而不是比较字符串来处理特殊字符。我试图使用这样的东西（应用于源和目标）但没有成功。

#snippet of code doesnt work, just to illustrate 

pattern = "[a-zA-Z0-9]"
for i,w in enumerate(source):
    for ch in w:

        if not re.match(pattern, ch) :
            print source , ch

            source[i] = re.sub( ch,r"\\" + ch, source[i])

在这个链接上，有一个类似的问题。

python - Python比较两个字符串列表的相似性

4 回答 4

Related

Reference