unix - 如何连接文本文件中的单词

Question

我有以下格式的文件：

B: that


I: White


I: House


B: the
I: emergency


I: rooms


B: trauma
I: centers

我需要做的是从顶部逐行读取，如果该行以B开头然后删除B：如果它以I开头：然后删除I：并连接到前一个（前一个处理相同规则）。

预期输出：

that White House
the emergency rooms
trauma centers

我尝试了什么：

while read line
do
    string=$line

    echo $string | grep "B:"  1>/dev/null 
    if [ `echo $?` -eq 0 ] //if start with " B: "
    then
        $newstring= echo ${var:4} //cut first 4 characters which including B: and space

        echo $string | grep "I:"  1>/dev/null 
    if [ `echo $?` -eq 0 ] //if start with " I: "
    then
        $newstring= echo ${var:4} //cut first 4 characters which including I: and space
done < file.txt

我不知道的是如何将它放回线路（在文件中）以及如何将线路连接到之前处理的线路。

score 0 · Accepted Answer

awk -F":" '{a[NR]=$0}
           /^ B:/{print line;line=$2}
           /^ I:/{line=line" "$2}
           END{
               if(a[NR]!~/^B/)
               {print line}
          }' Your_file

score 0 · Accepted Answer

使用 awk 打印I:和B:记录的第二个字段。该变量first用于控制换行符输出。

/B:/搜索B:模式。这种模式标志着记录的开始。如果记录不是第一个，则打印一个换行符，然后打印数据 $2。

如果找到的模式是数据 $2（打印I:随后的第二个字段。I:

awk 'BEGIN{first=1}
     /B:/ { if (first) first=0; else  print "";  printf("%s ", $2); }
     /I:/ { printf("%s ", $2) }
     END {print ""}' filename

score 0 · Accepted Answer

这可能对您有用（GNU sed）：

sed -r ':a;$!N;s/\n$//;s/\n\s*I://;ta;s/B://g;s/^\s*//;P;D' file

或者：

sed -e ':a' -e '$!N' -e 's/\n$//' -e 's/\n\s*I://' -e 'ta' -e 's/B://g' -e 's/^\s*//' -e 'P' -e 'D' file

score 0 · Accepted Answer

awk '/^B/ {printf "\n%s",$2} /^I/ {printf " %s",$2}' file

that White House
the emergency rooms
trauma centers

缩短一些

awk '/./ {printf /^B/?"\n%s":" %s",$2}' file

score 0 · Accepted Answer

有一个在 RS 模式上使用 awk 自动拆分的有趣解决方案。请注意，这对输入格式的变化有点敏感：

<infile awk 1 RS='(^|\n)B: ' | awk 1 RS='\n+I: ' ORS=' ' | grep -v '^ *$'

输出：

that White House
the emergency rooms
trauma centers

这至少适用于 GNU awk 和 Mikes awk。

unix - 如何连接文本文件中的单词

5 回答 5

Related

Reference