0

我有一个.xml文件,我必须在其中搜索“ <reviseddate>”标签。它可以在文件中出现多次。如果是这样,我必须将“ <reviseddate>”标签替换为“ <reviseddate1>”,我需要一个 shell 脚本

文本示例如下:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate> February 4, 2006 </reviseddate>, <reviseddate> August 14, 2006 </reviseddate>,
and <reviseddate> October 7, 2006 </reviseddate>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

输出应该如下

Manuscript received <receiveddate> June 7, 2005 <receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,        
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

我试过了:

for i in $c do 
   sed -e "s/<reviseddate>/<reviseddate$i>/g" $path/$input_file > $path/input_new.xml
   cp $path/input_new.xml $path/$input_file 
   rm -f input_new.xml 
done
4

1 回答 1

0

我会使用这样的 Perl 脚本来完成这项工作:

#!/usr/bin/env perl
use strict;
use warnings;

my $i = 1;
while (<>)
{
    while (m%<reviseddate>([^<]+)</reviseddate>%)
    {
        s%<reviseddate>([^<]*)</reviseddate>%<reviseddate$i>$1</reviseddate$i>%;
        $i++;
    }
    print;
}

对于每一行,对于每个未编号<reviseddate>的标签,将标签替换为适当编号的标签。

样本输出:

Manuscript received <receiveddate>June 7, 2005</receiveddate>; revised             
<reviseddate1> February 4, 2006 </reviseddate1>, <reviseddate2> August 14, 2006 </reviseddate2>,
and <reviseddate3> October 7, 2006 </reviseddate3>. This work was supported by the 
<supported><agency-name>California Department of Transportation through the California  
Center for Innovative Transportation and the California Partners for Advanced Highway 
and Transit Program</agency-name><grant-grp/></supported>. The contents of this paper 
reflect the views of the authors and do not necessarily indicate acceptance by the 
sponsors. The Associate Editor for this paper was M. M. Sokoloski.</affnote-para>

您可以调整它以处理替代方案,例如一行的开始标签和下一行的结束标签。在你需要它之​​前,没有必要大惊小怪。使用正则表达式是一门艺术。您需要平衡所有可能场景的即时需求和弹性。


由于 Perl 显然不是“shell”(但sed它是),因此您可以安排足够频繁地处理文件以找到所有条目并更改它们。

tmp=$(mktemp ./revise.XXXXXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

i=1
while grep -s '<reviseddate>' filename
do
    sed "1,/<reviseddate>/ s%<reviseddate>\([^<]*\)</reviseddate>%<reviseddate$i>\1</reviseddate$i>%" filename > $tmp
    mv $tmp filename
    i=$(($i+1))
done

rm -f $tmp # Should be a no-op
trap 0

这会迭代地更新文件。该1,/<reviseddata>部分确保只有第一个<reviseddate>标签是更新g的(命令上没有s%%%,这是至关重要的)。陷阱代码确保不会留下临时文件。

这适用于您的样本数据,提供相同的输出。对于小文件,这很好。如果您要管理数 GB 的文件,Perl 会更好,因为它只处理文件一次。

于 2013-03-08T08:33:46.363 回答