python - 通过删除换行符将一些连续的行合并为一个

Question

我有一个带有随机换行符的文本文件。所有新行都以“客户”一词开头。如何删除第二行和第三行末尾的多余换行符？

client | This is first row | 2013-02-01 23:45:59 | last column
clientd | second row with a line break
third line part of row 2 | 2013-01-31 12:44:00 | last column
client xyz | some text here | 2013-12-21 
12:54:12 | last column

预期结果：

client | This is first row | 2013-02-01 23:45:59 | last column
clientd | second row with a line break third line part of row 2 | 2013-01-31 12:44:00 | last column
client xyz | some text here | 2013-12-21 12:54:12 | last column

sed 命令有效，但如果可能的话，我正在寻找任何改进。

cat test.txt | tr '\n' ' ' | sed 's/client/\nclient/g'

还有其他方法可以实现吗？

score 1 · Accepted Answer

这是另一个 awk 单行代码：

awk -vRS='(^|\n)client' 'NR>1{print "client"gensub("\n"," ","g",$0)}' file

它通过将记录分隔符 ( RS) 设置为与行首匹配的正则表达式来client工作。

也可以编写一个正则表达式来匹配换行符后跟除之外的其他client内容，但这并不漂亮：

\n([^c]|c[^l]|cl[^i]|cli[^e]|clie[^n]|clien[^t])

如果您的数据文件不是太大而无法将整个文件读入内存，则可以将上述内容与 perl 一起使用，例如：

perl -0777pe "s/\n([^c]|c[^l]|cl[^i]|cli[^e]|clie[^n]|clien[^t])/ \1/g" file

（以上是不完美的，因为每个替代项中的“不匹配”字符可能是换行符，在这种情况下它不会更改为空格。可以通过更改 to 的每个实例来修复它[^X]，(?:$|[^X])如果你应该这样做你真的想使用它。）

score 0 · Accepted Answer

Python

>>> with open('test.txt') as fin:
        print fin.readline().rstrip(), # don't prepend \n to first line
        for line in fin:
            print line.rstrip().replace('client', '\nclient'),


client | This is first row | 2013-02-01 23:45:59 | last column 
clientd | second row with a line break third line part of row 2 | 2013-01-31 12:44:00 | last column 
client xyz | some text here | 2013-12-21 12:54:12 | last column

score 0 · Accepted Answer

单程：

awk '/^client/{if (x)print x;x=$0;next}{x=x FS $0;}END{print x}' file

每次遇到客户记录时，打印上一条记录并开始在变量 x 中累积当前记录，直到检索到下一条客户记录。

score 0 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r ':a;$!N;/^(client).*\n\1/!{s/\n/ /;ta};P;D' file

如果不需要空格，这将用空格替换额外的换行符：

sed -r ':a;$!N;/^(client).*\n\1/!{s/\n//;ta};P;D' file

python - 通过删除换行符将一些连续的行合并为一个

4 回答 4

Related

Reference