linux - 解析文件中的数据

Question

我有一个文本文件，其中包含以下类型的数据：

Example:
10212012115655_113L_-247R_247LRdiff_0;
10212012115657_114L_-246R_246LRdiff_0;
10212012115659_115L_-245R_245LRdiff_0;
10212012113951_319L_-41R_41LRdiff_2;
10212012115701_116L_-244R_244LRdiff_0;
10212012115703_117L_-243R_243LRdiff_0;
10212012115705_118L_-242R_242LRdiff_0;
10212012113947_317L_-43R_43LRdiff_0;
10212012114707_178L_-182R_182LRdiff_3;
10212012115027_278L_-82R_82LRdiff_1;

我想复制所有具有

1) _2 _3 _1 at the end of it into another file along with
2) stripping out the semicolon at the end of it.

所以最后文件中的数据将是

Example:  
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1

我该怎么做呢？我正在使用 linux ubuntu 10.04 64bit

谢谢

score 2 · Accepted Answer

这是一种使用方法sed：

sed -n 's/\(.*_[123]\);$/\1/p' file.txt > newfile.txt

这是一种使用方法grep：

grep -oP '.*_(1|2|3)(?=;$)' file.txt > newfile.txt

内容newfile.txt：

10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1

score 1 · Accepted Answer

如果格式始终相同，并且每行的末尾只有一个分号，您可以使用它grep来查找行，然后sed替换;：

grep -P "_(1|2|3);$" your_file | sed 's/\(.*\);$/\1/' > your_new_file

-P命令中的告诉grep它使用 Perl-regex 解释器进行解析。或者，您可以使用egrep（如果可用）。

score 1 · Accepted Answer

如果您有兴趣，这里是 awk 解决方案：

awk '/_[321];$/{gsub(/;/,"");print}' your_file

测试如下：

> awk '/_[321];$/{gsub(/;/,"");print}' temp
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1

score 0 · Accepted Answer

tr -c ";" "\n" > newfile
grep '*_[123]$' newfile > newfile

这应该有效。首先，您将全部翻译并保存到目标文件;。\n然后使用 grep 匹配仅包含*_[123]在末尾的行并将匹配结果再次保存到该文件中，该文件将替换所有以前的数据。最后我用$.

一些使用tr和grep的例子，以防你不熟悉它。

linux - 解析文件中的数据

4 回答 4

Related

Reference