html - 如何将文件内容读入 Perl 标量？

Question

我想做的是从另一台服务器获取文件的内容。由于我与 perl 不协调，也不知道它的 mods 和功能，所以我是这样处理的：

 my $fileContents;
 if( $md5Con =~ m/\.php$/g ) {
     my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $@";
     $ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
     $ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
     open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
     $fileContents = <FILE>;
     close(FILE);
     unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
     $ftp->quit;
 }

我认为 id 做的是从服务器获取文件，放在我的本地机器上，编辑内容，上传到任何地方然后删除临时文件。

但我似乎无法弄清楚如何获取文件的内容；

open FILE, ">>c:/perlscripts/" . md5_hex($md5Con) . "-code.php" or die $!;
$fileContents = <FILE>;
close(FILE);

不断出错；

使用未初始化的值 $fileContents

我猜这意味着它没有返回值。

非常感谢任何帮助。

>>>>>>>>>> 编辑 <<<<<<<<<<

my $fileContents;
if( $md5Con =~ m/\.php$/g ) {
    my $ftp = Net::FTP->new($DB_ftpserver, Debug => 0) or die "Cannot connect to some.host.name: $@";
    $ftp->login($DB_ftpuser, $DB_ftppass) or die "Cannot login ", $ftp->message;
    $ftp->get("/" . $root . $webpage, "c:/perlscripts/" . md5_hex($md5Con) . "-code.php") or die $ftp->message;
    my $file = "c:/perlscripts/" . md5_hex($md5Con) . "-code.php";
    {
        local( $/ ); # undefine the record seperator
        open FILE, "<", $file or die "Cannot open:$!\n";
        my $fileContents = <FILE>;
        #print $fileContents;
        my $bodyContents;
        my $headContents;

        if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
            print $0 . $1 . "\n";
            $bodyContents = $dbh->quote($1);    
        }
        if( $fileContents =~ m/^.*<\/head>/gi ) {
            print $0 . $1 . "\n";
            $headContents = $dbh->quote($1);    
        }

        $bodyTable = $dbh->quote($bodyTable);
        $headerTable = $dbh->quote($headerTable);
        $dbh->do($createBodyTable) or die " error: Couldn't create body table: " . DBI->errstr;
        $dbh->do($createHeadTable) or die " error: Couldn't create header table: " . DBI->errstr;
        $dbh->do("INSERT INTO $headerTable ( headData, headDataOutput ) VALUES ( $headContents, $headContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
        $dbh->do("INSERT INTO $bodyTable ( bodyData, bodyDataOutput ) VALUES ( $bodyContents, $bodyContents )") or die " error: Couldn't connect to database: " . DBI->errstr;
        $dbh->do("INSERT INTO page_names (linkFromRoot, linkTrue, page_name, table_name, navigation, location) VALUES ( $linkFromRoot, $linkTrue, $page_name, $table_name, $navigation, $location )") or die " error: Couldn't connect to database: " . DBI->errstr;

        unlink("c:/perlscripts/" . md5_hex($md5Con) . "-code.php");
    }
    $ftp->quit;
}

以上使用 print 将打印整个文件。但是，由于某种原因，这两个正则表达式返回错误。知道为什么吗？

if( $fileContents =~ m/<\s*body[^>]*>.*$/gi ) {
            print $0 . $1 . "\n";
            $bodyContents = $dbh->quote($1);    
        }
        if( $fileContents =~ m/^.*<\/head>/gi ) {
            print $0 . $1 . "\n";
            $headContents = $dbh->quote($1);    
        }

score 10 · Accepted Answer

这在标准发行版中包含的 Perl FAQ 的第 5 节中进行了介绍。

如何一次读取整个文件？

您可以使用该Path::Class::File::slurp模块一步完成。

use Path::Class;
$all_of_it = file($filename)->slurp; # entire file in scalar
@all_lines = file($filename)->slurp; # one line per element

处理文件中所有行的惯用 Perl 方法是一次处理一行：

open (INPUT, $file) || die "can't open $file: $!";
while (<INPUT>) {
    chomp;
    # do something with $_
}
close(INPUT)        || die "can't close $file: $!";

这比将整个文件作为行数组读入内存然后一次处理一个元素要高效得多，这通常是——如果不是几乎总是——错误的方法。每当你看到有人这样做时：

@lines = <INPUT>;

您应该仔细考虑一下为什么需要一次加载所有内容。这不是一个可扩展的解决方案。Tie::File您可能还会发现使用标准模块或DB_File模块绑定更有趣$DB_RECNO，它允许您将数组绑定到文件，以便访问数组中的元素实际上访问文件中的相应行。

您可以将整个文件句柄内容读入一个标量。

{
local(*INPUT, $/);
open (INPUT, $file) || die "can't open $file: $!";
$var = <INPUT>;
}

这会暂时取消您的记录分隔符，并将在块退出时自动关闭文件。如果文件已经打开，只需使用以下命令：

$var = do { local $/; <INPUT> };

对于普通文件，您也可以使用该read功能。

read( INPUT, $var, -s INPUT );

第三个参数测试INPUT文件句柄上数据的字节大小，并将那么多字节读入缓冲区$var。

score 8 · Accepted Answer

如果您想一次性读取所有文件内容，请使用Path::Class::File::slurp 。

然而，更重要的是，使用HTML 解析器来解析 HTML。

score 3 · Accepted Answer

open FILE, "c:/perlscripts" . md5_hex($md5Con) . "-code.php" or die $!;
while (<FILE>) {
    # each line is in $_
}
close(FILE);

将打开文件并允许您逐行处理它（如果这是您想要的 - 否则调查binmode）。我认为问题在于您在文件名前面加上>>. 有关更多信息，请参阅本教程。

我注意到您还使用正则表达式来解析 HTML。一般来说，我建议使用解析器来执行此操作（例如，请参阅HTML::Parser）。由于 HTML 缺乏规律性，正则表达式不适用于 HTML，并且在一般情况下无法可靠地工作。

score 2 · Accepted Answer

此外，如果您需要编辑文件的内容，请查看 CPAN 模块 Tie::File
此模块使您无需创建临时文件来编辑内容并将其写回同一文件.

编辑：
你正在看的是一种啜食文件的方法。可能是您必须取消定义记录分隔符变量$/

下面的代码对我来说很好：



use strict;
my $file = "test.txt";
{
    local( $/ ); # undefine the record seperator
    open FILE, "<", $file or die "Cannot open:$!\n";
    my $lines =<FILE>;
    print $lines;
}

另请参阅本文中的“传统啜饮”部分。

score 1 · Accepted Answer

但是，由于某种原因，这两个正则表达式返回错误。知道为什么吗？

.在正则表达式中，默认匹配除换行符以外的任何字符。</head>大概你在标签之前和标签之后都有换行符<body>。要.匹配包括换行符在内的任何字符，请使用该//s标志。

我不确定您的print $0 . $1 ...代码是关于什么的；您没有在匹配项中捕获任何内容以存储在 $1 中，并且 $0 不是用于正则表达式捕获的变量，它是非常不同的东西。

score 0 · Accepted Answer

0

如果你想获取文件的内容，

 @lines = <FILE>;

于 2010-02-06T15:15:07.517 回答

score 0 · Accepted Answer

0

使用File::Slurp::Tiny. 一样方便File::Slurp，但没有错误。

于 2014-08-16T11:18:26.633 回答

html - 如何将文件内容读入 Perl 标量？

7 回答 7

如何一次读取整个文件？

Related

Reference