4

My question is inspired by an interesting question somebody asked at http://tex.stackexchange.com and my attempt to provide the AWK solution. Note AWK here means NAWK since as we know gawk != awk. I am reproducing a bit of that answer here.

Original question:

I have a rather large document with lots of math notation. I've used |foo| throughout to indicate the absolute value of foo. I'd like to replace every instance of |foo| with \abs{foo}, so that I can control the notation via an abs macro I define.

My answer:

This post is inspired by cmhughes proposed solutions. His post is one of the most interesting posts on TeX editing which I have ever read. I just spent 2 hours trying to produce nawk solution. During that process I learned that AWK not only doesn't support non-greedy regular expressions which is to be expected since it is sed's cousin but even worse AWK regular expression does not capture its groups. A simple AWK script

#!/usr/bin/awk -f

NR>0{
gsub(/\|([^|]*)\|/,"\\abs{\1}")
print
}

Applied to the file

$|abs|$ so on and so fourth
$$|a|+|b|\geq|a+b|$$
who is affraid of wolf $|abs|$

will unfortunately produce

$\abs{}$ so on and so fourth
$$\abs{}+\abs{}\geq\abs{}$$
who is affraid of wolf $\abs{}$

An obvious fix for above solution is to use gawk instead as in

awk '{print gensub(/\|([^|]*)\|/, "\\abs{\\1}", "g", $0)}'

However I wonder if there is a way to use an external regex library from AWK for example tre. Even more generally how does one interface AWK with the C code (the pointer to documentation would be OK).

4

2 回答 2

1

在 的情况下nawk,答案是:不是不修改源代码。

其中两个问题是:

  • 正则表达式是语言的一部分(~//),以及定义的语言函数(match()等)
  • nawk使用自己的正则表达式代码(在文件中b.c),因此与使用一个正则表达式库的程序不同,使用具有替代实现的不同库regcomp() regexec()将无济于事。

解决此问题的一种方法是使用第三个参数gawk进行扩展。(你也注意到了,但我尽量避免它。)match()gensub()

gawk还支持可加载扩展,这将是一种与 PCRE 库接口以提供新的“内置”功能(虽然不是替换~或任何内部功能)的方式。这个 API 是新的“4.1”扩展方式,以前的版本有一个完全不同的实现。

最后,nawk实现所需替换的一种方法是:

match($0,/\|[^|]*\|/) {
    do {
        sub(/\|[^|]*\|/,"\\abs{" substr($0,RSTART+1,RLENGTH-2) "}",$0)
    } while (match($0,/\|[^|]*\|/))
}
{ print }
于 2013-08-14T16:05:45.453 回答
1

这是我使用拆分功能的基于 nawk 的解决方案:

awk '{
   split($0, arr, "|");
   for (i=1; i<=length(arr); i++) {
      if (i%2)
         printf("%s", arr[i]);
      else
         printf("\\abs{%s}", arr[i]);
   }
   printf("%s", ORS)
}' file

输出:

$\abs{abs}$ so on and so fourth
$$\abs{a}+\abs{b}\geq\abs{a+b}$$
who is affraid of wolf $\abs{abs}$

现场演示:http: //ideone.com/lMf2hL

于 2013-08-14T17:09:41.240 回答