2

我有一个文件:

@Book{gjn2011ske, 
  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}
}

@article{gjn2010jucs,
  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
}

我想改进只删除第一行的正则表达式。注意:记录分隔符RS="}\n"不能更改。

我试过:

awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file

我想打印结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

感谢您的帮助。

编辑:

我提出的解决方案:

awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file 
4

4 回答 4

2

使用指定的设置很难完成你想要的RS(因为address = {Krak\'ow}有一个额外的记录结尾)。我宁愿选择:

awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE 

在这里查看它的实际应用。

编辑我不知道为什么它必须使用正则表达式解决方案,你能解释一下吗?

无论如何,还有另一个(工作,见这里)解决方案,它使用正则表达式,但不是你所期望的。:

awk 'BEGIN{ RS="}\n" }
{
  split($0,a,"\n")
  for (e=1;e<=length(a);e++) {
      if (a[e] ~ "{" && a[e] !~ "}") {
          sub("$","}",a[e])
      }
      if (a[e] ~ "=") { print a[e] }
  }
  printf("\n")
}' INPUTFILE

还有一个,用一个更简单的正则表达式,但它失败了,address最后一个 "" 行将}被你的 删除RS,它会打印一个 final }...

awk 'BEGIN{ RS="}\n" }
{
  sub("@[^,]\+,","")
  print $0
}' INPUTFILE
于 2012-09-30T11:09:53.453 回答
2

一种不使用正则表达式的方法。将字段分隔符设置为换行符,现在寄存器的每个键都是一个字段。有了它,遍历每个字段并打印那些不以开头的@

awk '
    BEGIN { 
        RS="}\n"; 
        FS=OFS="\n"; 
    } 
    { 
        for (i=1; i<=NF; i++) { 
            if ( substr($i, 1, 1) != "@" ) { 
                printf "%s%s", $i, (i == NF) ? RS : OFS; 
            } 
        } 
    }
' file

输出:

author =   {Grzegorz J. Nalepa},
title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher =    {Wydawnictwa AGH},
year =     2011,
address =  {Krak\'ow}

Author =   {Grzegorz J. Nalepa},
Journal =  {Journal of Universal Computer Science},
Number =   7,
Pages =    {1006-1023},
Title =    {Collective Knowledge Engineering with Semantic Wikis},
Volume =   16,
Year =     2010
于 2012-09-30T11:58:28.733 回答
2

GNU sed会这样做:

sed '/^@/,/^}$/ { //d }' file.txt

结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

请注意,您可以使用该-i标志就地进行更改(即覆盖文件内容),您可以使用该-s标志对多个文件进行更改。例如:

sed -s -i '/^@/,/^}$/ { //d }' *.txt
于 2012-09-30T12:32:08.723 回答
1
awk '{if($0!~/@/&&$0!~/^}/)print}' temp

测试如下:

> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>
于 2012-10-01T06:46:10.150 回答