在解析器perl
的帮助下使用的一种解决方案:html
HTML::TokeParser
#!/usr/bin/env perl
use warnings;
use strict;
use HTML::TokeParser;
use File::Spec;
my ($newfile, $currentfile);
## Give as arguments the html files to process, like *.html
for ( @ARGV ) {
my $p = HTML::TokeParser->new( $_ ) or die;
## Search a "div" tag with the attribute "class" to value "top".
while ( my $info = $p->get_tag( 'div' ) ) {
if ( $info->[1]{class} eq 'top' ) {
$newfile = $p->get_text;
## Omit next two tokens until following "a" tag (</div>, space).
$info = $p->get_token for 1 .. 3;
## If tag is a start 'a' tag, extract file name of the href attribute.
if ( $info->[0] eq 'S' &&
$info->[1] eq 'a' ) {
$currentfile = ( File::Spec->splitpath( $info->[2]{href} ) )[2];
$newfile .= join q||, (split /(\.)/, $currentfile)[-2 .. -1];
}
last;
}
}
## Rename file.
if ( $newfile && $currentfile ) {
printf STDERR qq|Renaming --> %s <-- to --> %s <--\n|, $currentfile, $newfile;
rename $currentfile, $newfile;
}
$newfile = $currentfile = undef;
}
像这样运行它:
perl-5.14.2 script.pl *.html
在我的一项测试中,结果应该类似于:
Renaming --> 15d705df3.txt <-- to --> SomethingFile1.txt <--
Renaming --> 15d705dg6.txt <-- to --> SomethingFile2.txt <--