2

所以我创建了这个文件夹 C:\TempFiles 来测试运行以下代码片段

在这个文件夹中,我有两个文件 -> nd1.txt、nd2.txt 和一个文件夹 C:\TempFiles\Temp2,其中我只有一个文件 nd3.txt

现在,当我执行此代码时:-

import os,file,storage
database = file.dictionary()
tools = storage.misc()
lui = -1                           # last used file index
fileIndex = 1

def sendWord(wrd, findex):                  # where findex is the file index
global lui
if findex!=lui:
    tools.refreshRecentList()
    lui = findex
if tools.mustIgnore(wrd)==0 and tools.toRecentList(wrd)==1:
    database.addWord(wrd,findex)        # else there's no point adding the word to the database, because its either trivial, or has recently been added 

def showPostingsList():
    print("\nPOSTING's LIST")
    database.display()

def parseFile(nfile, findex):
    for line in nfile:
        pl = line.split()
        for word in pl:
            sendWord(word.lower(),findex)

def parseDirectory(dirname):
    global fileIndex
    for root,dirs,files in os.walk(dirname):
        for name in dirs:
            parseDirectory(os.path.join(root,name))
        for filename in files:
            nf = open(os.path.join(root,filename),'r')
            parseFile(nf,fileIndex)
            print(" --> "+ nf.name)
            fileIndex+=1
            nf.close()

def main():
    dirname = input("Enter the base directory :-\n")
    print("\nParsing Files...")
    parseDirectory(dirname)
    print("\nPostings List has Been successfully created.\n",database.entries()," word(s) sent to database")
    choice = ""
    while choice!='y' and choice!='n':
        choice = str(input("View List?\n(Y)es\n(N)o\n -> ")).lower()
        if choice!='y' and choice!='n':
            print("Invalid Entry. Re-enter\n")
    if choice=='y':
        showPostingsList()

main()

现在我应该只遍历这三个文件一次,然后我放了一个 print(filename) 来测试它,但显然我遍历了内部文件夹两次:-

Enter the base directory :-
C:\TempFiles

Parsing Files...
 --> C:\TempFiles\Temp2\nd3.txt
 --> C:\TempFiles\nd1.txt
 --> C:\TempFiles\nd2.txt
 --> C:\TempFiles\Temp2\nd3.txt

Postings List has Been successfully created.
 34  word(s) sent to database
View List?
 (Y)es
 (N)o
-> n

谁能告诉我如何修改 os.path.walk() 以避免错误不是我的输出不正确,而是它遍历整个文件夹两次,这不是很有效。

4

1 回答 1

1

您的问题并非特定于 Python 3,而是其os.walk()工作方式 - 迭代已经对子文件夹进行了递归,因此您可以进行递归调用:

def parseDirectory(dirname):
    global fileIndex
    for root,dirs,files in os.walk(dirname):
        for filename in files:
            nf = open(os.path.join(root,filename),'r')
            parseFile(nf,fileIndex)
            print(" --> "+ nf.name)
            fileIndex+=1
            nf.close()

通过调用parseDirectory()dirs您开始了您唯一的子文件夹的另一个独立行走。

于 2013-09-29T17:13:26.777 回答