perl - 如何使用 tr(1) 从非空行中删除换行符 ('\n', 0x0A)？

Question

我有一个名为file1的文件，其内容如下：

The answer t
o your question 

A conclusive a
nswer isn’t al
ways possible.

When in doubt, ask pe
ople to cite their so
urces, or to explain

Even if we don’t agre
e with you, or tell y
ou.

我想将file1转换为file2。后者应如下所示：

The answer to your question

A conclusive answer isn’t always possible.

When in doubt, ask people to cite their sources, or to explain

Even if we don’t agree with you, or tell you.

如果我只是简单地执行cat file1 | tr -d "\n" > file2"，所有换行符都将被删除。如何使用该实用程序仅删除那些在非空行上的换行符？tr(1)

score 9 · Accepted Answer

perl -00 -lpe 'tr/\n//d'

-00是 Perl 的“段落”模式，以一个或多个空行作为分隔符读取输入。-l将系统换行符附加到打印命令，因此删除输入中的所有换行符是安全的。

score 4 · Accepted Answer

tr不能这样做，但sed很容易

sed -ne '$!H;/^$/{x;s/\n//g;G;p;d;}' file1 > file2

这会找到非空行并保存它们。然后，在空行上，它从保存的数据中删除换行符并打印结果，后跟换行符。保留的数据被删除并重复该过程。

编辑：

根据@potong 的评论，这是一个不需要在文件末尾有额外空行的版本。

sed -ne 'H;/^$/{x;s/\n//g;G;p;};${x;s/\n//g;x;g;p;}' file1 > file2

score 2 · Accepted Answer

如果您知道输入中没有出现某个字符，则可以执行以下操作：

# Assume that the input doesn't contain the '|' character at all
tr '\n' '|' < file1 | sed 's/\([^|]\)|\([^|]\)/\1\2/g' | tr '|' '\n' > file2

这会用替换字符替换所有换行符|；然后删除在某个其他字符之后和之前sed的所有实例；|最后，它|用换行符替换回来。

score 2 · Accepted Answer

这可能对您有用：

# sed '1{h;d};H;${x;s/\([^\n]\)\n\([^\n]\)/\1\2/g;p};d' file

The answer to your question 

A conclusive answer isn't always possible.

When in doubt, ask people to cite their sources, or to explain

Even if we don't agree with you, or tell you.

score 2 · Accepted Answer

换行符file1分为四类：

换行符后跟另一个换行符
换行符前面有换行符
文件末尾的换行符
夹着换行符

通过读取整个输入（-000选项）删除第一个类，并在我们看到它们的任何地方替换一个换行符（s/\n\n/\n/g）得到我们

$ perl -000 -pe 's/\n\n/\n/g' 文件1
答案
你的问题
一个结论性的
答案不是人
可能的方式。
如有疑问，请询问 pe
可以引用他们的
urces，或解释
即使我们不同意
和你在一起，或者告诉你
欧。

这不是我们想要的，因为第一类换行符应该终止file2.

我们可能会尝试更聪明，并使用负回溯来删除其他换行符之前的换行符（第二类），但输出与前一种情况无法区分，这是有道理的，因为这次我们删除的是后者而不是前者在每对相邻的换行符中。

$ perl -000 -pe 's/(?<=\n)\n//g' 文件1
答案
你的问题
一个结论性的
答案不是人
可能的方式。
如有疑问，请询问 pe
可以引用他们的
urces，或解释
即使我们不同意
和你在一起，或者告诉你
欧。

即便如此，这仍然不是我们想要的，因为其他换行符前面的换行符成为file2.

很明显，我们希望在file1.

那么我们想要的是一个只删除第四个类的程序：每个换行符之前没有另一个换行符，并且后面既没有另一个换行符也没有逻辑输入结束。

使用Perl 的环视断言，规范很简单，虽然在外观上可能有点吓人。“前面没有换行符”是负面的后视(?<!\n)。使用负前瞻，(?!...)我们不想看到另一个换行符或 ( |) 输入 ( ) 的结尾$。

把它们放在一起，我们得到

$ perl -000 -pe 's/(?<!\n)\n(?!\n|$)//g' file1
你的问题的答案

一个决定性的答案并不总是可能的。

如有疑问，请人们引用他们的来源，或解释

即使我们不同意你，或者告诉你。

最后，创建file2、重定向标准输出。

perl -000 -pe 's/(?<!\n)\n(?!\n|$)//g' file1 >file2

score 0 · Accepted Answer

你不能自己得到tr它。tr非常方便，但严格来说是逐字符过滤器，没有前瞻或后视。

您也许可以使用获得示例输出sed，但这真的很痛苦（我认为！）。编辑（sed 大师 @Sorpigal 证明我错了！）

这是一个解决方案awk

/home/shellter:>cat <<-EOS \
| awk 'BEGIN{RS="\n\n"}; { gsub("\n", "", $0) ;printf("%s %s", $0, "\n\n") }'
The answer t
o your question 

A conclusive a
nswer isn’t al
ways possible.

When in doubt, ask pe
ople to cite their so
urces, or to explain

Even if we don’t agre
e with you, or tell y
ou.
EOS


# output
The answer to your question

A conclusive answer isnt always possible.

When in doubt, ask people to cite their sources, or to explain

Even if we dont agree with you, or tell you.

奇怪的是，它显示为三倍行距，但实际上是 dbl 行距的。

awk 具有为每个文件填充的预定义变量，以及它读取的每一行文本，即

RS = RecordSeperator -- normally a line of data, but a configurable value, that when set 
                     to '\n\n' means a blank line, or a typical separation on a paragraph

$0 = complete line of text (as defined by the internal variables RS (RecordSeparator)
                             In this problem, it is each paragraph of data, viewed though
                             as a record.

$1 = first field in text (as defined by the internal variables FS (FieldSeparator)
                           which defaults to (possibly multiple) space chars OR tab char
                          a line with 2 connected spaces chars and 1 tab char has 3 fields)

NF = Number(of)Fields in current line of data (again fields defined by value of FS as 
                                                described above)

(there are many others, besides, $0, $n, $NF, $FS, $RS).

您可以通过使用示例代码中的变量以编程方式递增 $1、$2、$3 等值，例如 $i（i 是一个数字介于 2 和 NF 之间的变量。前导“$”表示给我值字段 i（即 $2、$3、$4 ...）

我希望这有帮助。

perl - 如何使用 tr(1) 从非空行中删除换行符 ('\n', 0x0A)？

6 回答 6

Related

Reference