perl - 使用 PERL 从包含 .txt 文件的目录中去除 HTML

Question

我是一个完整的n00b。我已经阅读了该站点上的许多其他帖子，但我无法找到解决这个相对简单问题的方法。基本上，我有一个用 HTML 标记的文本文件目录。我想从这个目录中的每个文件中剥离 HTML，然后将每个单独的文件导出到一个新的文本文件（最好使用 _out.txt 扩展名）。这是我迄今为止尝试过的：

use strict;
use warnings;
use File::Find;
use HTML::FormatText;


my $root_path=qq{C:\\Filings\\test}; #Declare your input path
# Recursively it process all the sub directories in $root_path
find(\&process_multiple_dir, $root_path);
sub process_multiple_dir
{
    if (-f && $File::Find::name =~ m{\.txt$}) # It process .txt format files only
    {
          undef $/; # Input Record separator
          # Files Handling process
          open (FIN, "<$File::Find::name") || die "Cannot Open the Input file";
          my $file=<FIN>; # Assign the file handler to scalar variable
          #print $file;

          my $string = HTML::FormatText->format_file($file,leftmargin => 0, rightmargin => 50);
          #print $string;
          close (FIN);
          # Change the file name for the output file creation purpose
          $File::Find::name=~ s{\.txt}{_Out.txt};


          # Print the $file contents to new file
          open (FOUT, ">$File::Find::name") || die "Cannot Create the Output file";
          print FOUT $string;
          close (FOUT);
      }
}

此代码将输出一个具有新文件名的文件（标记为 _out.txt 扩展名），但新创建的文件中没有文本...

谢谢！

score 1 · Accepted Answer

我自己不使用 HTML::FormatText，但我认为正确的语法是：

my $string = HTML::FormatText->format_file($File::Find::name,leftmargin => 0, rightmargin => 50);

所以无需打开文件并将其放入$file.

（PS：在你的代码中使用一些缩进；它使它更具可读性:)）

perl - 使用 PERL 从包含 .txt 文件的目录中去除 HTML

1 回答 1

Related

Reference