awk - awk - 如何改进正则表达式？

Question

我有一个文件：

@Book{gjn2011ske, 
  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}
}

@article{gjn2010jucs,
  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
}

我想改进只删除第一行的正则表达式。注意：记录分隔符RS="}\n"不能更改。

我试过：

awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file

我想打印结果：

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

感谢您的帮助。

编辑：

我提出的解决方案：

awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file

score 2 · Accepted Answer

使用指定的设置很难完成你想要的RS（因为address = {Krak\'ow}有一个额外的记录结尾）。我宁愿选择：

awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE

在这里查看它的实际应用。

编辑我不知道为什么它必须使用正则表达式解决方案，你能解释一下吗？

无论如何，还有另一个（工作，见这里）解决方案，它使用正则表达式，但不是你所期望的。：

awk 'BEGIN{ RS="}\n" }
{
  split($0,a,"\n")
  for (e=1;e<=length(a);e++) {
      if (a[e] ~ "{" && a[e] !~ "}") {
          sub("$","}",a[e])
      }
      if (a[e] ~ "=") { print a[e] }
  }
  printf("\n")
}' INPUTFILE

还有一个，用一个更简单的正则表达式，但它失败了，address最后一个 "" 行将}被你的删除RS，它会打印一个 final }...

awk 'BEGIN{ RS="}\n" }
{
  sub("@[^,]\+,","")
  print $0
}' INPUTFILE

score 2 · Accepted Answer

一种不使用正则表达式的方法。将字段分隔符设置为换行符，现在寄存器的每个键都是一个字段。有了它，遍历每个字段并打印那些不以开头的@：

awk '
    BEGIN { 
        RS="}\n"; 
        FS=OFS="\n"; 
    } 
    { 
        for (i=1; i<=NF; i++) { 
            if ( substr($i, 1, 1) != "@" ) { 
                printf "%s%s", $i, (i == NF) ? RS : OFS; 
            } 
        } 
    }
' file

输出：

author =   {Grzegorz J. Nalepa},
title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
publisher =    {Wydawnictwa AGH},
year =     2011,
address =  {Krak\'ow}

Author =   {Grzegorz J. Nalepa},
Journal =  {Journal of Universal Computer Science},
Number =   7,
Pages =    {1006-1023},
Title =    {Collective Knowledge Engineering with Semantic Wikis},
Volume =   16,
Year =     2010

score 2 · Accepted Answer

我GNU sed会这样做：

sed '/^@/,/^}$/ { //d }' file.txt

结果：

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010

请注意，您可以使用该-i标志就地进行更改（即覆盖文件内容），您可以使用该-s标志对多个文件进行更改。例如：

sed -s -i '/^@/,/^}$/ { //d }' *.txt

score 1 · Accepted Answer

awk '{if($0!~/@/&&$0!~/^}/)print}' temp

测试如下：

> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>

awk - awk - 如何改进正则表达式？

4 回答 4

Related

Reference