0

I need help with my sed script. I have a XML-file where I have to remove everything except the text enclosed in these tags:

<TEXT>......</TEXT>
<HEADLINE>......</HEADLINE>

How do I write the sed code ? I know how to remove everything except the text enlosed in ONE tag.

s/.*<TEXT>\(.*\)<\/TEXT>.*/\1/

But how do i write the sed code for many tags ?

4

3 回答 3

1

您可以将多个命令传递给sed

$ echo '<TEXT>Hello</TEXT>
<HEADLINE>there</HEADLINE>' | sed -n 's/.*<TEXT>\(.*\)<\/TEXT>.*/\1/gp; s/.*<HEADLINE>\(.*\)<\/HEADLINE>.*/\1/gp' 
Hello
there

但是在将正则表达式应用于类似 XML 的文件时,您确实应该小心。

于 2013-01-24T20:49:51.350 回答
1

假设您有有效的 XML:

sed '/.*<\(TEXT\|HEADLINE\)>\(.*\)<\/\(TEXT\|HEADLINE\)>.*/!d;s//\2/' yourfile.xml

如果要使用sed脚本,请添加以下行:

/.*<\(TEXT\|HEADLINE\)>\(.*\)<\/\(TEXT\|HEADLINE\)>.*/!d;s//\2/

然后运行:

sed -f yourscript.sed < yourfile.xml
于 2013-01-24T20:50:02.850 回答
0

这可能对您有用(GNU sed):

 sed -r '/<(text|headline)>/I!d;s//&\n/;s/^[^\n]*\n//;:a;/<\//!{$!{N;ba}};s/\n/ /g;s/<\//\n&/;P;D' file

这将删除所有文本,接受介于TEXTHEADLINE标记之间的文本,并且在多行值上用空格替换换行符。

于 2013-01-25T07:07:10.063 回答