regex - 用于搜索和替换多个 html 文件中的多行的 Perl 脚本

Question

我在一个文件夹中有很多 html 文件。我需要以某种方式<div id="user-info" ...>...</div>从所有这些中删除 a 。据我所知，我需要为此使用 Perl 脚本，但我不知道 Perl 会这样做。有人可以帮我拿吗？

以下是“坏”代码的样子：

<div id="user-info" class="logged-in">
    <a class="icon icon-key-delete" href="https://test.dev/login.php?0,logout=1">Log Out</a>
    <a class="icon icon-user-edit" href="https://test.dev/control.php">Control Center</a>


</div> <!-- end of div id=user-info -->

先感谢您！

score 3 · Accepted Answer

使用XML::XSH2：

for { glob '*.html' } {
    open :F html (.) ;
    delete //div[@id="user-info" and @class="logged-in"] ;
    save :b ;
}

score 2 · Accepted Answer

perl -0777 -i.withdiv -pe 's{<div[^>]+?id="user-info"[^>]*>.*?</div>}{}gsmi;' test.html

-0777意味着什么都不分割，所以在整个文件中啜饮（而不是逐行，默认为 -p

-i.withdiv意味着更改文件，保留扩展名为 .withdiv 的原始文件（-p 的默认设置是仅打印）。

-p表示逐行传递（除了我们是 slurping）传递的代码（见 -e）

-e期望代码运行。

man perlrun或perldoc perlrun了解更多信息。

这是另一种解决方案，对于了解jquery的人来说会稍微熟悉一些，因为语法是相似的。这使用 Mojoliciousojo模块将 html 内容加载到 Mojo::DOM 对象中，对其进行转换，然后打印转换后的版本：

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { say x(scalar(read_file $_))->at("#user-info")->replace("")->root; }' test.html test2.html test*.html

直接替换内容：

perl -Mojo -MFile::Slurp -E 'for (@ARGV) { write_file( $_, x(scalar(read_file $_))->at("#user-info")->replace("")->root ); }' test.html

注意，这不仅会删除div，还会根据 Mojo 的 Mojo::DOM 模块重新编写内容，因此标签属性的顺序可能不同。具体来说，我看到<div id="user-info2" class="logged-in">重写为<div class="logged-in" id="user-info2">.

Mojolicious 至少需要 perl 5.10，但在那之后就没有非核心要求了。

regex - 用于搜索和替换多个 html 文件中的多行的 Perl 脚本

2 回答 2

Related

Reference