shell - 在每行的两个模式之间打印文本时出错

Question

我有一个带有以下文本的文本文件

Query= gi_4849 ref_YP_00.1_ flagellar assembly protein H[Bacillus]--
Query= gi_4851 ref_YP_00.1_ MS-ring protein[Bacillus]--
Query= gi_4852 ref_YP_00.1_ flagellar hook-basal body proteinFliE [Bacillus]--
Query= gi_4851 ref_YP_00.1_ [membrane protein][Bacillus]--
.
.
.

期望的输出：

flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE
[membrane protein]
.
.
.

我试过以下命令；

sed '/.1_/,/[Bacillus/p' filename > new
sed '/".1_"/,/"[Bacillus"/p' filename > new
awk '/.1_/,/[Bacillus/' filename > new
awk '/".1_"/,/"[Bacillus"/' filename > new

但awk不工作并sed给出错误。

sed: -e expression #1, char 19: unterminated address regex

score 1 · Accepted Answer

你只想打印部分行匹配然后GNU Grep你可以这样做：

$ grep -Po '_\s\K.*(?=[[])' file
flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE 
[membrane protein]

或更明确地说：

$ grep -Po '(?<=ref_YP_00.1_ ).*(?=\[Bacillus]--)' file
flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE 
[membrane protein]

如果您想考虑可选的尾随空格：

$ grep -Po '_\s\K.*\S(?=\s?[[])' file 
flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE
[membrane protein]

# OR

$ grep -Po '(?<=ref_YP_00.1_ ).*\S(?=\s?\[Bacillus]--)' file 
flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE
[membrane protein]

score 1 · Accepted Answer

使用sed此代码可以：

$ sed -r 's/.*1_ (.*)\[Bacillus.*/\1/g' file
flagellar assembly protein H
MS-ring protein
flagellar hook-basal body proteinFliE 
[membrane protein]

它获取行并捕获块 from 1_to的匹配组 #1 [Bacillus，然后将其打印回来。

score 0 · Accepted Answer

0

perl -lne 'print $1 if(/1_ (.*?)\[Bacillus*/)' your_file

于 2013-09-26T12:29:39.440 回答

shell - 在每行的两个模式之间打印文本时出错

3 回答 3

Related

Reference