python - 如何自然地对“WindowsPath”对象文件进行排序

Question

我正在使用 Path().glob() 遍历目录中的文件，但它没有以正确的自然顺序进行迭代。例如。它像这样迭代：

[WindowsPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P11/dataP11/SAMPLED_NORMALIZED/P11_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P12/dataP12/SAMPLED_NORMALIZED/P12_Cor.csv'),
# ...and so on from P1 to P30

当我希望它像这样迭代时：P1、P2、P3 等等。

我尝试使用下面的代码，但它给了我一个错误：

from pathlib import Path

file_path = r'C:/Users/HP/Desktop'

files = Path(file_path).glob(file)
sorted(files, key=lambda name: int(name[10:]))

在我尝试代码时，10 只是一个微不足道的数字。

错误：

TypeError: 'WindowsPath' object is not subscriptable

最终，我想要的是遍历文件并对每个文件做一些事情：

from pathlib import Path

for i, fl in enumerate(Path(file_path).glob(file)):
    # do something

我什至尝试过该库natsort，但它没有在迭代中正确排序文件。我试过了：

from natsort import natsort_keygen, ns
natsort_key1 = natsort_keygen(key=lambda y: y.lower())

from natsort import natsort_keygen, ns
natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)

上面的两个代码仍然给我 P1、P10、P11 等等。

任何帮助将不胜感激。

score 3 · Accepted Answer

使用natsort作品对这些数据进行排序，但您必须告诉它如何从Path对象中提取字符串（出于性能目的，默认情况下它不会这样做）。

In [2]: from pathlib import Path                                                                                    

In [3]: import natsort                                                                                              

In [4]: a = [Path('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
             Path('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
             Path('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv')]                                                                       

In [5]: natsort.natsorted(a, key=str)                                                                                      
Out[5]: 
[PosixPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv')]

In [6]: natsort.natsorted(a, alg=natsort.PATH)
Out[6]: 
[PosixPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv')]

第一个选项会将所有Path对象转换为natsort知道如何处理的字符串。这适用于您的数据。

第二个选项打开natsort'sPATH算法，它将自动Path正确处理对象，并且还为文件系统路径中常见的极端情况添加了更强大的处理。

完全公开，我是natsort作者。

score 2 · Accepted Answer

如果要按文件名中的数字排序，可以使用Path.name属性和提取数字的正则表达式。

from pathlib import Path
import re

file_path = r'C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/'

def _p_file_sort_key(file_path):
    """Given a file in the form P(digits)_cor.csv, return digits as an int"""
    return int(re.match(r"P(\d+)", file_path.name).group(1))

files = sorted(Path(file_path).glob("P*_Cor.csv"), key=_p_file_sort_key)

score 0 · Accepted Answer

您可以调用strPath 对象，也可以使用as_posix().

from pathlib import Path

for fn in sorted([str(p) for p in Path(file_path).glob('*.csv')]):
    # do something with fn

for fn in sorted([p.as_posix() for p in Path(file_path).glob('*.csv')]):
    # do something with fn

python - 如何自然地对“WindowsPath”对象文件进行排序

3 回答 3

Related

Reference