受Nima Soroush 回答的启发,这里有一个类似于 Bash 4 的选项的广义递归通配函数globstar
,它允许在子树的所有级别上匹配**
.
例子:
# Match all *.txt and *.bak files located anywhere in the current
# directory's subtree.
globex '**/{*.txt,*.bak}'
# Find all *.pm files anywhere in the subtrees of the directories in the
# module search path, @INC; follow symlinks.
globex '{' . (join ',', @INC) . '}/**/*.pm', { follow => 1 }
注意:虽然这个函数File::Find
与内置函数相结合,glob
如果你熟悉 's 的行为,可能大部分工作都如你glob
所愿,但排序和符号链接行为有很多微妙之处 - 请参阅底部的评论。
一个明显的偏差glob()
是给定模式参数中的空白被认为是模式的一部分。要指定多个模式,请将它们作为单独的模式参数传递或使用大括号表达式,如上例所示。
源代码
sub globex {
use File::Find;
use File::Spec;
use File::Basename;
use File::Glob qw/bsd_glob GLOB_BRACE GLOB_NOMAGIC GLOB_QUOTE GLOB_TILDE GLOB_ALPHASORT/;
my @patterns = @_;
# Set the flags to use with bsd_glob() to emulate default glob() behavior.
my $globflags = GLOB_BRACE | GLOB_NOMAGIC | GLOB_QUOTE | GLOB_TILDE | GLOB_ALPHASORT;
my $followsymlinks;
my $includehiddendirs;
if (ref($patterns[-1]) eq 'HASH') {
my $opthash = pop @patterns;
$followsymlinks = $opthash->{follow};
$includehiddendirs = $opthash->{hiddendirs};
}
unless (@patterns) { return };
my @matches;
my $ensuredot;
my $removedot;
# Use fc(), the casefolding function for case-insensitive comparison, if available.
my $cmpfunc = defined &CORE::fc ? \&CORE::fc : \&CORE::lc;
for (@patterns) {
my ($startdir, $anywhereglob) = split '(?:^|/)\*\*(?:/|$)';
if (defined $anywhereglob) { # recursive glob
if ($startdir) {
$ensuredot = 1 if m'\./'; # if pattern starts with '.', ensure it is prepended to all results
} elsif (m'^/') { # pattern starts with root dir, '/'
$startdir = '/';
} else { # pattern starts with '**'; must start recursion with '.', but remove it from results
$removedot = 1;
$startdir = '.';
}
unless ($anywhereglob) { $anywhereglob = '*'; }
my $terminator = m'/$' ? '/' : '';
# Apply glob() to the start dir. as well, as it may be a pattern itself.
my @startdirs = bsd_glob $startdir, $globflags or next;
find({
wanted => sub {
# Ignore symlinks, unless told otherwise.
unless ($followsymlinks) { -l $File::Find::name and return; }
# Ignore non-directories and '..'; we only operate on
# subdirectories, where we do our own globbing.
($_ ne '..' and -d) or return;
# Skip hidden dirs., unless told otherwise.
unless ($includehiddendirs) { return if basename($_) =~ m'^\..'; }
my $globraw;
# Glob without './', if it wasn't part of the input pattern.
if ($removedot and m'^\./(.+)$') {
$_ = $1;
}
$globraw = File::Spec->catfile($_, $anywhereglob);
# Ensure a './' prefix, if the input pattern had it.
# Note that File::Spec->catfile() removes it.
if($ensuredot) {
$globraw = './' . $globraw if $globraw !~ m'\./';
}
push @matches, bsd_glob $globraw . $terminator, $globflags;
},
no_chdir => 1,
follow_fast => $followsymlinks, follow_skip => 2,
# Pre-sort the items case-insensitively so that subdirs. are processed in sort order.
# NOTE: Unfortunately, the preprocess sub is only called if follow_fast (or follow) are FALSE.
preprocess => sub { return sort { &$cmpfunc($a) cmp &$cmpfunc($b) } @_; }
},
@startdirs);
} else { # simple glob
push @matches, bsd_glob($_, $globflags);
}
}
return @matches;
}
注释
SYNOPSIS
globex PATTERNLIST[, \%options]
DESCRIPTION
Extends the standard glob() function with support for recursive globbing.
Prepend '**/' to the part of the pattern that should match anywhere in the
subtree or end the pattern with '**' to match all files and dirs. in the
subtree, similar to Bash's `globstar` option.
A pattern that doesn't contain '**' is passed to the regular glob()
function.
While you can use brace expressions such as {a,b}, using '**' INSIDE
such an expression is NOT supported, and will be treated as just '*'.
Unlike with glob(), whitespace in a pattern is considered part of that
pattern; use separate pattern arguments or a brace expression to specify
multiple patterns.
To also follow directory symlinks, set 'follow' to 1 in the options hash
passed as the optional last argument.
Note that this changes the sort order - see below.
Traversal:
For recursive patterns, any given directory examined will have its matches
listed first, before descending depth-first into the subdirectories.
Hidden directories:
These are skipped by default, onless you set 'hiddendirs' to 1 in the
options hash passed as the optional last argument.
Sorting:
A given directory's matching items will always be sorted
case-insensitively, as with glob(), but sorting across directories
is only ensured, if the option to follow symlinks is NOT specified.
Duplicates:
Following symlinks only prevents cycles, so if a symlink and its target
they will both be reported.
(Under the hood, following symlinks activates the following
File::Find:find() options: `follow_fast`, with `follow_skip` set to 2.)
Since the default glob() function is at the heart of this function, its
rules - and quirks - apply here too:
- If literal components of your patterns contain pattern metacharacters,
- * ? { } [ ] - you must make sure that they're \-escaped to be treated
as literals; here's an expression that works on both Unix and Windows
systems: s/[][{}\-~*?]/\\$&/gr
- Unlike with glob(), however, whitespace in a pattern is considered part
of the pattern; to specify multiple patterns, use either a brace
expression (e.g., '{*.txt,*.md}'), or pass each pattern as a separate
argument.
- A pattern ending in '/' restricts matches to directories and symlinks
to directories, but, strangely, also includes symlinks to *files*.
- Hidden files and directories are NOT matched by default; use a separate
pattern starting with '.' to include them; e.g., globex '**/{.*,*}'
matches all files and directories, including hidden ones, in the
current dir.'s subtree.
Note: As with glob(), .* also matches '.' and '..'
- Tilde expansion is supported; escape as '\~' to treat a tilde as the
first char. as a literal.
- A literal path (with no pattern chars. at all) is echoed as-is,
even if it doesn't refer to an existing filesystem item.
COMPATIBILITY NOTES
Requires Perl v5.6.0+
'/' must be used as the path separator on all platforms, even on Windows.
EXAMPLES
# Find all *.txt files in the subtree of a dir stored in $mydir, including
# in hidden subdirs.
globex "$mydir/*.txt", { hiddendirs => 1 };
# Find all *.txt and *.bak files in the current subtree.
globex '**/*.txt', '**/*.bak';
# Ditto, though with different output ordering:
# Unlike above, where you get all *.txt files across all subdirs. first,
# then all *.bak files, here you'll get *.txt files, then *.bak files
# per subdirectory encountered.
globex '**/{*.txt,*.bak}';
# Find all *.pm files anywhere in the subtrees of the directories in the
# module search path, @INC; follow symlinks.
# Note: The assumption is that no directory in @INC has embedded spaces
# or contains pattern metacharacters.
globex '{' . (join ',', @INC) . '}/**/*.pm', { follow => 1 };