perl - Clean everything after start till to the end point

Question

Want to clean everything after one start point till the end one

Example:

    <!--
        <group>
                <name>Octopus</name>
                <inventory>
                        <inventoryName>octopus</inventoryName>
                        <decoder>DFFDD</decoder>
                        <command>cat /etc/hosts</command>
                </inventory>
        </group>
 -->

Where  is the end point sometimes content is multiple lines to the end point. Everything which is in those tags I want to be deleted.

I try to start some things with sed like:

sed 's/^<\!--//g' but not sure how to continue after it to catch all and to be cleaned when saw the end tag.

score 3 · Accepted Answer

GNU sed的代码：

sed -r '/<!--/,/-->/{//!d;s/(.*<!--).*/\1/;s/.*(-->.*)/\1/}' file

会话协议：

    $ cat file
    test line #1
    <AXXX> <!--  <BXXX>
        <group>
            <name>Octopus</name>
            <inventory>
                <inventoryName>octopus</inventoryName>
                <decoder>DFFDD</decoder>
                <command>cat /etc/hosts</command>
            </inventory>
        </group>
    <CXXX> --> <DXXX>
    test line 12
$ sed -r '/<!--/,/-->/{//!d;s/(.*<!--).*/\1/;s/.*(-->.*)/\1/}' file
test line #1
<AXXX> <!--
--> <DXXX>
test line 12

score 3 · Accepted Answer

如果我得到你想要做的，你想删除评论。正确的？

这样的事情呢？

<!--
     blah blah blah -->

或者

<!-- blah blah blah -->

或者

<!-- blah blah blah
-->

甚至这个？

 <foo><bar> <!-- <fubar>blah blah</fubar> --> </bar></foo>

您不能在 XML 上使用正则表达式，因为 XML 太复杂了。有很多解析 XML 数据的 Perl 库，您应该使用这些库。

虽然它不再是首选，但XML::Simple可以完全按照您的要求做，而且绝对不会大惊小怪。XML::Simple可以将您的 XML 文件重建为兼容的版本。实体可能不完全匹配，但它将与您的旧结构兼容。而且，XML::Simple 删除了注释。

use strict;
use warnings;
use XML::Simple;

my $xml_struct_ref = XMLin( $xml_file );
my $xml_file_output = XMLout ( $xml_struct_ref );

然后，您只需写入$xml_file_output新的 XML 文件。已删除所有评论！

score 3 · Accepted Answer

3

非贪婪替换正则表达式.匹配换行符，

$string =~ s|<!-- .*? -->||xsg;

于 2013-07-19T12:53:02.360 回答

score 1 · Accepted Answer

Perl 解决方案：

#!/usr/bin/env perl

use strict;
use warnings;

my $filename = $ARGV[0];

open FILE, "<$filename" or die $!;
local $/;
my $text = <FILE>;
close FILE;

$text =~ s/<!--[\s\S]*?-->//g;

open FILE, ">$filename" or die $!;
print FILE $text;
close FILE;

您需要[\s\S]*?（或者(.|\n)）任何字符的最短匹配，包括换行符。.单独对多行字符串不起作用，因为它匹配除换行符以外的任何字符。

像这样运行脚本：

./script.pl /path/to/your.file

score 1 · Accepted Answer

在HTML::Parser中，您可以找到类似的片段：

perl -0777 -MHTML::Parser -nE 'HTML::Parser->new(default_h=>[sub{print shift},"text"],comment_h=>[""])->parse($_)||die $!' < file.html >decommented.html

在下一个 html 上测试：

simple
<!-- this is an comment -->
multi
<!--
this is an
multiline comment
-->
stupid
<img src="copen.jpg" alt='image of open tag <!--'>
<img src="cclose.jpg" alt='image of closing tag -->'>
js
<script>
alert("<!-- here -->");
</script>
end

并打印：

simple

multi

stupid
<img src="copen.jpg" alt='image of open tag <!--'> <img src="cclose.jpg" alt='image of closing tag -->'>
js
<script>
alert("<!-- here -->");
</script>

perl - Clean everything after start till to the end point

5 回答 5

Related

Reference