python - 根据序列号而不是名称读取文件

Question

我在一个文件夹中有数千张图像。图片分别命名为0.png、1.png、2.png......

我编写了以下代码来为正样本生成平均图像，对于负样本也是如此。

file_list = glob.glob(trainDir)
n = len(file_list)
label = np.load('labels_v2.dat')
positive = np.empty((300,400,4))
negative = np.empty((300,400,4))
labels = np.empty(n)
count_p = 0
count_n = 0

for i in range(1000):
    img = imread(file_list[i])
    lbl = label[i]
    if (lbl == 1):
        positive +=  img
        count_p += 1
        print file_list[i]

但是，这会按 1、10、100、1000、10000、10001 的顺序读取文件......而我的标签按 0、1、2、3 的顺序......我怎样才能让它读入正确的顺序？

score 3 · Accepted Answer

file_list = os.listdir(trainDir)
file_list.sort(key=lambda s: int(os.path.splitext(s)[0]))

或者，要跳过 O(n lg n) 的排序成本，在循环内执行

img = imread("%d.EXT" % i)

EXT适当的扩展名在哪里（例如jpg）。

score 1 · Accepted Answer

您似乎想要数字顺序而不是排序的字典顺序。我的第一个想法是：

import locale
l=["11", "01", "3", "20", "0", "5"]
l.sort(key=locale.strxfrm)    # strcoll would have to repeat the transform
print l

但这仅在您的语言环境实际上以这种方式对数字进行排序时才有帮助，而且我不知道为此设置什么。

同时，一种解决方法是在排序函数中查找数字。

def numfromstr(s):
  s2=''.join(c for c in s if c.isdigit())
  return int(s2)
l.sort(key=numfromstr)

但仅此一项就有仅按数字排序的缺点。可以通过在数字边界上拆分并对结果元组进行排序来进行补偿……这变得越来越复杂。

import re
e=re.compile('([0-9]+|[^0-9]+)')
def sorttup(s):
  parts=[]
  for part in e.findall(s):
    try:
      parts.append(int(part))
    except ValueError:
      parts.append(part)
  return tuple(parts)
l.sort(key=sorttup)

好吧，这至少更接近一点，但它既不漂亮也不快。

类似的问题还有更多答案。

python - 根据序列号而不是名称读取文件

2 回答 2

Related

Reference