3

我想从一般纯文本创建新版本的文档,这样每个新版本的文档每行包含一个句子。这意味着,每一行文本都包含以 结尾的字符串序列.。你能给我推荐一些示例脚本吗?

 In the beginning God created the heavens and the earth.
 Now the earth was formless and empty.  Darkness was on the surface
 of the deep.  God's Spirit was hovering over the surface
 of the waters.

进入

 In the beginning God created the heavens and the earth.
 Now the earth was formless and empty.
 Darkness was on the surface of the deep.
 God's Spirit was hovering over the surface of the waters.
4

3 回答 3

3
awk 'BEGIN {RS = "[.] *"; ORS = ".\n"} {gsub(" *\n *", " "); if ($0 !~ /^ +$/) print}'

将每个句点的文本隔开,如果有的话,用空格 ( RS) 分隔。

每行的输出后跟一个句点和换行符 ( ORS)。

为每个换行符和任何周围的空格 ( gsub()) 替换一个空格。

如果该行不完全由空格组成,则打印它。

[[:blank:]]如果要容纳制表符和空格,可以将显示空格的位置更改为(后跟星号或加号)。

于 2012-04-30T01:52:27.843 回答
3

一种使用方式perl

perl -pe 's/\n\Z/ /; s/(\.)\s*/$1\n/g' infile

输出:

In the beginning God created the heavens and the earth.
Now the earth was formless and empty.
Darkness was on the surface of the deep.
God's Spirit was hovering over the surface of the waters.
于 2012-04-30T09:04:21.293 回答
2

首先,尝试结合使用trsed

$ cat input
They're selling postcards of the hanging. They're painting the passports brown. The beauty parlor is filled with sailors. The circus is in town.


$ cat input | tr '.' '\n' | sed 's/$/\./;s/[    ]*//'
They're selling postcards of the hanging.
They're painting the passports brown.
The beauty parlor is filled with sailors.
The circus is in town.
于 2012-04-29T21:02:47.400 回答