python - Python文件操作

Question

假设我有这样的文件夹

  rootfolder
      | 
     / \ \
    01 02 03 ....
    |
  13_itemname.xml

所以在我的根文件夹下，每个目录代表一个像 01 02 03 这样的月份，在这些目录下我有创建时间和项目名称的项目，例如 16_item1.xml、24_item1.xml 等，你可能会猜到有几个项目和每个 xml每小时创建一次。

现在我想做两件事：

我需要生成一个月的项目名称列表，即对于 01，我有 item1、item2 和 item3 里面。
我需要过滤每个项目，例如 item1：我想从 01_item1.xml 到 24_item1.xml 读取每个项目。

如何以简单的方式在Python中实现这些？

score 5 · Accepted Answer

这里有两种方法可以满足您的要求（如果我理解正确的话）。一个有正则表达式，一个没有。你选择你喜欢哪一个；）

“setdefault”行可能看起来很神奇。有关解释，请参阅文档。我把它作为“读者练习”来理解它是如何工作的；）

from os import listdir
from os.path import join

DATA_ROOT = "testdata"

def folder_items_no_regex(month_name):

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):
      date, name = file.split( "_", 1 )

      # skip files that were not possible to split on "_"
      if not date or not name:
         continue

      # ignore non-.xml files
      if not name.endswith(".xml"):
         continue

      # cut off the ".xml" extension
      name = name[0:-4]

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items

def folder_items_regex(month_name):

   import re

   # The pattern:
   # 1. match the beginnning of line "^"
   # 2. capture 1 or more digits ( \d+ )
   # 3. match the "_"
   # 4. capture any character (as few as possible ): (.*?)
   # 5. match ".xml"
   # 6. match the end of line "$"
   pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):

      match = pattern.match( file )
      if not match:
         continue

      date, name = match.groups()

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items
if __name__ == "__main__":
   from pprint import pprint

   data = folder_items_no_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )


   data = folder_items_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )

score 0 · Accepted Answer

假设项目名称具有固定长度的前缀和后缀（即 3 个字符的前缀，例如 '01_' 和 4 个字符的后缀 '.xml'），您可以像这样解决问题的第一部分：

names = set(name[3:-4] for name in os.listdir('01') if name.endswith('.xml')]

这将为您提供唯一的项目名称。

要过滤每个项目，只需查找以该项目名称结尾的文件并在需要时对其进行排序。

item_suffix = '_item2.xml'
filtered = sorted(name for name in os.listdir('01') if name.endswith(item_suffix))

score 0 · Accepted Answer

不确定您到底想做什么，但这里有一些可能有用的指针

创建文件名（“%02d”表示用零填充）

foldernames = ["%02d"%i for i in range(1,13)]

filenames = ["%02d"%i for i in range(1,24)]

使用os.path.join构建复杂路径而不是字符串连接

os.path.join(foldername,filename)

os.path.exists用于首先检查文件是否存在

if os.path.exists(newname):
    print "file already exists"

要列出目录内容，请使用glob

from glob import glob
xmlfiles = glob("*.xml")

使用shutil进行更高级别的操作，例如创建文件夹、重命名文件

shutil.move(oldname,newname)

basename从完整路径中获取文件名

filename = os.path.basename(fullpath)

python - Python文件操作

3 回答 3

Related

Reference