0

我想用来sed从文本文件中删除所有评论。假设注释从“A”字符开始,到换行符结束。我想删除从“A”开始到行尾的所有内容,包括换行符。但是,我不想删除从“AA”开始的评论。

样本输入:

%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete

期望的输出:

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
4

5 回答 5

2

尝试这样做:

$ perl -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt

输出

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

笔记

如果您需要就地更改文件,请添加-i开关(在您的测试之后),所以:

$ perl -i -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt

感谢审稿人的贡献。

于 2013-03-17T14:58:06.880 回答
2

perl 否定后向断言的完美应用:

perl -pe 's/(?<!%)%(?!%).*$//s' << END
%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete
END

输出

%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete

s标志确保点将匹配换行符以根据要求实现“行连接”。

这种正则表达式匹配可能会给您带来问题,例如,如果您有类似的行

The date is `date +%Y%m%d` % this is a comment

你最终会得到

The date is `date +

如果您的实际评论需要空格,您可以使用这个正则表达式:

(^| )%( .*|)$

意思是

  • 行首或空格
  • 其次是评论字符
  • 后跟(一个空格和零个或多个字符)或什么都没有
  • 其次是行尾
于 2013-03-17T18:38:24.383 回答
1

也许是这样:

第二次更新

$ sed -e '/^%[^%]/d' -e 's/ %[^%]*$/@/' -e :a -e '/@/N; s/\n//; ta' input | sed 's/@/ /g'
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
于 2013-03-17T15:14:22.020 回答
0

编辑添加了更改,使其在文件的最后一行运行良好...尝试:

sed -e :a -e '/^[^%]*%%/n; /%/{s/%.*//; N; s/\n//;};ta' file

输入测试:

%% comment to do not delete
% comment to delete
% another comment to delte
%
%% comment to do not delete
Some text % comment to delete
Some more text % more comment to delete
and some more text %% comment to do not delete
fdgdfgdgdgd %
gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by % comment to delete that contains %% somewhere
hello there

输出:

%% comment to do not delete
%% comment to do not delete
Some text Some more text and some more text %% comment to do not delete
fdgdfgdgdgd gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by hello there
于 2013-03-17T16:30:53.840 回答
0

在 Sed 中使用表达式顺序

使用 sed,指令的顺序可能很重要。例如:

$ sed -ne '/^% /d; /[^%]%.*/ {s/%.*//; n}; p' /tmp/corpus 
%% comment to do not delete
%% comment to do not delete
and some more text %% comment to do not delete

在此示例中,sed 脚本按以下顺序执行其任务:

  1. 抑制输出。
  2. 删除以单个百分号开头的行。
  3. 使用替换删除从单个百分比到行尾的所有字符,然后将下一行附加到模式空间而不使用换行符。
  4. 打印图案空间。

此脚本适用于您在问题中提供的语料库。不保证可以在不修改的情况下与任何其他语料库一起使用,并且如果您附加到模式空间的行包含注释字符,则明确不起作用。

于 2013-03-17T17:41:33.537 回答