0

我正在编写一个 perl 脚本来组织我们拥有的包含所有订单文档的文件夹。除了前几天有人扔给我的一个曲线球外,该脚本大部分都有效。

问题是当我们有一个我们最近重做的订单时。我们是土地测量员,有时我们会进行调查,然后几年后我们会进行所谓的“飞越”,我们将返回并在另一个文件中“附加”订单,或者记录土地的变化,或者只是简单地说一切都好,什么都没有改变。

这对我造成问题的地方是我们制作/制作的新文件与旧文件具有相同的订单号文件号。例如,我们可能有一个名为 CF145323 的文档,那么该文档将有几个名为 CF145323.pdf、*_1.pdf、*_2.pdf 等的单页 PDF 文件。

我正在寻找的是一种修改我的脚本以计算它找到的文件并确定/预测下一个文件号的方法。因此,如果有 *_1.pdf 到 *_3.pdf。我希望 perl 获取不匹配的文件并将其设为 *_4.pdf。跟着我?

另一个问题是文件有时位于与文件名中的第一个数字不匹配的不同文件夹中。那部分我似乎已经弄清楚了,只是我没有计算出的编号。

我也在 Windows 中工作,所以我不能使用任何 Linux 命令。

这是我离开它的最后一个状态的脚本:

#!/usr/bin/perl
use strict;
use warnings;

# Root folder for Order Documents
my $orders_root = "C:\\Users\\Ian\\Desktop\\Order_docs";

# Keep track of how many files are processed
my $files_counter = 0;

# Keep track of how many junk files are processed
my $junk_counter = 0;

# Store a list of folders that match the 3 number naming scheme
my @matched_folders;

# Create a place to move junk files into
if (! -e "$orders_root\\Junk") {

    system "mkdir $orders_root\\Junk";

}

# Clear the screen
system "cls";

print "Processing files, please wait...\n\n";

# Open $order_dir_root
opendir(ORDERS_ROOT, "$orders_root") or die $!;

# Collect a list of all sub folders
my @folders = readdir(ORDERS_ROOT);

# Close $order_dir_root 
closedir(ORDERS_ROOT);

# Remove the directories "." and ".." from the output
splice @folders, 0, 2;

foreach my $folder (@folders) {

    # Filter out all directories that don't match the numbering system
    if ($folder =~ / \d{3} /xm) {

        # If the folder matches the expression above, add it to the list of matched folders
        push @matched_folders, $folder;
    
        # Open each folder inside of the Order Documents root
        opendir(CURRENT_FOLDER, "$orders_root\\$folder");

        # Foreach folder opened, collect a list of files in the folder for sorting
        my @files = readdir(CURRENT_FOLDER);

        # Close the current folder
        closedir(CURRENT_FOLDER);

        # Remove the directories "." and ".." from the output
        splice @files, 0, 2;

        foreach my $file (@files) {

            # Match each file to the standard naming scheme
            if ($file =~ /^ (C[AFL]|ME) \d{3} \d{3}([_|\-] \d+)? \. pdf /xmi) {

                ++$files_counter;
            
            # If that file does not match, move it to a junk folder
            } else {
            
                ++$junk_counter;

                rename ("$orders_root\\$folder\\$file", "$orders_root\\Junk\\$file");

            } # End pdf match           

        } # End foreach $file
        
    } # End folder match

} # End foreach $folder



foreach my $folder (@matched_folders) {
    
    # Open $folder
    opendir(CURRENT_FOLDER, "$orders_root\\$folder");

    # Collect a list of all sub folders
    my @files = readdir(CURRENT_FOLDER);

    # Close $folder
    closedir(CURRENT_FOLDER);

    splice @files, 0, 2;
    
    foreach my $file (@files) {
        
        if ($file =~ /^ (?<office> (C[AFL]|ME)) (?<folder_num> \d{3}) (?<file_num> \d{3}([_|\-] \d+)?) \. (?<file_ext> pdf) /xmi) {
        
            my $office = uc($+{office});
            my $folder_num = $+{folder_num};
            my $file_num = $+{file_num};
            my $file_ext = lc($+{file_ext});
            
            # Change hyphens to a underscore
            $file_num =~ s/\-/_/;
            
            my $file_name = "$office" . "$folder_num" . "$file_num" . "\." . "$file_ext";
            my $fly_by_name = "$office" . "$folder_num" . "$file_num" . "_FB" . "\." . "$file_ext";
            
            # Check if the current file belongs in the current folder
            if ($folder != $folder_num) {

                # If the folder does not exist create the folder
                if (! -e "$orders_root\\$folder_num") {
                
                    system "mkdir $orders_root\\$folder_num";
                    
                }
                
                # Check to see if the file already exists
                if (! -e "$orders_root\\$folder_num\\$file_name") {
                
                    # Moves the file to correct place, these are mismatched files
                    rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
                
                } else {
                
                    # Appends the file with a "_#" where # is equal to the 1+ the last file number, these files are fly bys
                    rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$fly_by_name");
                
                }
            
            # Files are in the correct place, the file name will be corrected only
            } else {
            
                rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
            
            }
        
        } # End $file match
        
    } # End foreach $file

} # End foreach $folder



# Show statistics after processing
print "Done!\n\n";
print "$#folders folders processed\n";
print "$files_counter files processed\n";
print "$junk_counter junk files removed\n"
4

1 回答 1

1

您的脚本相当大,但我建议采用不同的方法。

首先,也是最明显的,是这样的:

my $base = "CF145323";
my $num = 1;
$num++ while -f "${base}_$num.pdf";

my $filename = "${base}_$num.pdf";
print "$filename\n";

换句话说,查看文件是否已经存在。您必须对其进行修改以测试您保存文件的各种目录,如果编号顺序中有间隙,这将不起作用。

记录每个文件和最新一代文件可能更容易。通常,这将是一个散列,例如,使用“CF145323”作为键,最新版本号作为其值。可以使用 Storable 模块保存和恢复散列(非常易于使用,并且在 Perl 基础中)。

于 2011-06-29T08:36:59.877 回答