python - python3: having problems with os.walk

Question

I have the following directory/file setup (this is simplified):

Ce  
+---top.txt
+---X0.0        
|     |  
|     +---Y0.0  
|     |     |
|     |     +---X0.0Y0.0Z0.0.dat
|     |     +---X0.0Y0.0Z0.05.dat   
|     +---Y0.05
|           |
|           +---X0.0Y0.05Z0.0.dat
|           +---X0.0Y0.05Z0.05.dat
+---X0.05
      |  
      +---Y0.0  
      |     |
      |     +---X0.0Y0.0Z0.0.dat
      |     +---X0.0Y0.0Z0.05.dat   
      +---Y0.05
            |
            +---X0.0Y0.05Z0.0.dat
            +---X0.0Y0.05Z0.05.dat

Within each Y directory, I need to make a 'psub' file, which contains appends a list of the .dat files to a copy of the file 'top.txt'.

I am attempting to do this using the os.walk function in python 3: however I am incurring two problems:
1. The new psub file appears in the X0.0 directory
2. Code does not list any filenames (presumably because it is not finding any) and then gives the error that it is unable to find the X0.05 directory.

Thus far, I have the following code:

import os

with open('top.txt', 'r') as reader:
    data=reader.read()
    for root, dirs, files in os.walk('.'):
        for folder in dirs:
            os.chdir(os.path.join(root, folder))
            with open('psub', 'a') as writer:
                writer.write(data)
                for names in files:
                    if names.endswith('.dat'):
                        print('gulp <' + names + '> ', end='', file=writer)
                        print(names.rsplit('.',1)[0], end='', file=writer)
                        print('.out', file=writer)

The resulting psub file should be:

#!/bin/bash
#MOAB -l walltime=48:00:0
#MOAB -j oe
#MOAB -N GULP-job
cd "$PBS_O_WORKDIR"
module load apps/gulp
#!/bin/bash
gulp <X0.00Y0.00Z0.00.dat> X0.00Y0.00Z0.00.out
gulp <X0.00Y0.00Z0.05.dat> X0.00Y0.00Z0.05.out

of which the first seven lines are in top.txt.

Any (simple) pointers as to where I am going wrong would be much appreciated. Cheers

score 4 · Accepted Answer

walk is meant to recursively iterate through all the directories by itself, so there is no need to iterate through all the subdirectories yourself. Get rid of the for folder in dirs: loop.

I'd also recommend replacing the multiple print statements with a single string format command, like so:

print('gulp <{}.dat> {}.out'.format(names.rsplit('.',1)[0]), file=writer)
#or use writer.write(), to make it more transparent

You should also rename some of the variables so that they better represent what they are used for; for instance calling the iterated variable in the last loop names is misleading, because it is not in fact a collection of names, but a single filename.

EDIT: To avoid creating extra empty files, you need to check if the directory needs a file created. You can do this by iterating through the filenames and setting a flag if any are a.dat file that needs processing.

EDIT 2: Factored code, and renamed a bunch of stuff, so now it is a lot clearer.

The sum of these modifications results in:

import os

def isDat(filename): return filename.endswith('.dat')

def hasDat(filenames): #this method checks if it contains one ending with '.dat'
    for filename in filenames:
        if isDat(filename): return true
    return false

def datFiles(filenames):
    for filename in filenames:
        if isDat(filename): yield filename

def gulpLines(filenames):
    for filename in datFiles(filenames):
        yield 'gulp <{}.dat> {}.out\n'.format(filename.rsplit('.',1)[0])

def makePsub(root, filenames, prologue):
    with os.open(os.path.join(root, 'psub'), 'a') as writer:
    #try putting 'psub.sh' instead, perhaps it is getting confused with no extension
        writer.write(prologue)
        for gulpLine in gulpLines(filenames):
            writer.write(gulpLine)


with open('top.txt') as reader:
    prologue=reader.read() #this is a much more descriptive name
#you can close it as soon as you have its contents

for root, dirs, filenames in os.walk('.'):
    if hasDat(filenames):
        makePsub(root, filenames, prologue)

Python is meant to be made up of numerous small functions, not big long monstrosities nested twenty deep.

Additionally, by factoring out the code, you give yourself the ability to determine exactly what part of the code is causing the problems, because you can then test each function individually. If you suspect (for instance) that the problem is not being able to correctly recognize .dat files, you can test the isDat function by itself until you have fixed it. If you suspect that the issue is with writing to the psub file, you can check that by replacing gulpLines with a dummy implementation (where it uses a predetermined list of results, or has the user do it manually, rather than actually generating them).

The other thing you can do is insert debug code inside the loops. For instance, insert a statement that outputs the contents of writer after you (supposedly) write data to it, to check if you actually have what you expect to have there. Another debug option is to run partial code; for instance, run a program that only does the walk loop, or only writes data to the files, or only parses the filenames (printing what it would write to the files to the console, for inspection purposes).

python - python3: having problems with os.walk

1 回答 1

Related

Reference