0

我有一个这样的文件夹列表:

u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/',

现在我想要的是唯一的父文件夹列表。

即在上面的示例中,我希望将其减少为:

u'Magazines/testfolder1',
u'Magazines/testfolder2',
u'Magazines/testfolder3',

因为它们都包含子文件夹。

我在我的数据库中递归地添加文件夹,所以如果我有,testfolder1那么脚本将自动递归其子文件夹。因此,如果他们的父级也在列表中,我不需要列表中的子文件夹。

我怎样才能做到这一点?

4

4 回答 4

2

使用集合

>>> list_of_folders = [
...     u'Magazines/testfolder1',
...     u'Magazines/testfolder1/folder1/folder2/folder3',
...     u'Magazines/testfolder1/folder1/',
...     u'Magazines/testfolder1/folder1/folder2/',
...     u'Magazines/testfolder2',
...     u'Magazines/testfolder2/folder1/folder2/folder3',
...     u'Magazines/testfolder2/folder1/',
...     u'Magazines/testfolder2/folder1/folder2/',
...     u'Magazines/testfolder3',
...     u'Magazines/testfolder3/folder1/folder2/folder3',
...     u'Magazines/testfolder3/folder1/',
...     u'Magazines/testfolder3/folder1/folder2/',
... ]
>>> result = set()
>>> for folder in list_of_folders:
...     for parent in result:
...         if folder.startswith(parent):
...             break
...     else:
...         result.add(folder)
... 
>>> result
{'Magazines/testfolder3', 'Magazines/testfolder2', 'Magazines/testfolder1'}

更新

list_of_folders = [
    ...
]
result = set()
for folder in list_of_folders:
    if all(not folder.startswith(parent) for parent in result):
        result.add(folder)
print result
于 2013-07-25T04:23:45.307 回答
0

如何使用正则表达式

import re

l = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/',
]

expect = [
    u'Magazines/testfolder1',
    u'Magazines/testfolder2',
    u'Magazines/testfolder3', 
]

result = filter(lambda x: re.match('^[^\/]+\/[^\/]+$', x), l)

assert expect == result
于 2013-07-25T07:42:25.543 回答
0

我相信下面的伴侣是您正在寻找的解决方案

lst = [
u'Magazines/testfolder1',
u'Magazines/testfolder1/folder1/folder2/folder3',
u'Magazines/testfolder1/folder1/',
u'Magazines/testfolder1/folder1/folder2/',
u'Magazines/testfolder2',
u'Magazines/testfolder2/folder1/folder2/folder3',
u'Magazines/testfolder2/folder1/',
u'Magazines/testfolder2/folder1/folder2/',
u'Magazines/testfolder3',
u'Magazines/testfolder3/folder1/folder2/folder3',
u'Magazines/testfolder3/folder1/',
u'Magazines/testfolder3/folder1/folder2/'
 ]

    for x in lst:
       for y in lst[:]: 
           if x in y and len(x)<len(y):
               lst.remove(y)
    print lst

输出

[u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

该程序迭代地从您的列表中删除子文件夹,只留下父文件夹。

于 2013-07-25T22:15:10.730 回答
0
l =[u'Magazines/testfolder1',
    u'Magazines/testfolder1/folder1/folder2/folder3',
    u'Magazines/testfolder1/folder1/',
    u'Magazines/testfolder1/folder1/folder2/',
    u'Magazines/testfolder2',
    u'Magazines/testfolder2/folder1/folder2/folder3',
    u'Magazines/testfolder2/folder1/',
    u'Magazines/testfolder2/folder1/folder2/',
    u'Magazines/testfolder3',
    u'Magazines/testfolder3/folder1/folder2/folder3',
    u'Magazines/testfolder3/folder1/',
    u'Magazines/testfolder3/folder1/folder2/', ]

mincount = min(s.count('/') for s in l)
[d for d in sorted(l) if d.count('/') <= mincount]
#=> [u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3']

它并不过分聪明,但它适用于有共同根源的地方。

于 2013-07-26T01:42:07.857 回答