regex - Using an external regex library from AWK

Question

My question is inspired by an interesting question somebody asked at http://tex.stackexchange.com and my attempt to provide the AWK solution. Note AWK here means NAWK since as we know gawk != awk. I am reproducing a bit of that answer here.

Original question:

I have a rather large document with lots of math notation. I've used |foo| throughout to indicate the absolute value of foo. I'd like to replace every instance of |foo| with \abs{foo}, so that I can control the notation via an abs macro I define.

My answer:

This post is inspired by cmhughes proposed solutions. His post is one of the most interesting posts on TeX editing which I have ever read. I just spent 2 hours trying to produce nawk solution. During that process I learned that AWK not only doesn't support non-greedy regular expressions which is to be expected since it is sed's cousin but even worse AWK regular expression does not capture its groups. A simple AWK script

#!/usr/bin/awk -f

NR>0{
gsub(/\|([^|]*)\|/,"\\abs{\1}")
print
}

Applied to the file

$|abs|$ so on and so fourth
$$|a|+|b|\geq|a+b|$$
who is affraid of wolf $|abs|$

will unfortunately produce

$\abs{}$ so on and so fourth
$$\abs{}+\abs{}\geq\abs{}$$
who is affraid of wolf $\abs{}$

An obvious fix for above solution is to use gawk instead as in

awk '{print gensub(/\|([^|]*)\|/, "\\abs{\\1}", "g", $0)}'

However I wonder if there is a way to use an external regex library from AWK for example tre. Even more generally how does one interface AWK with the C code (the pointer to documentation would be OK).

score 1 · Accepted Answer

在的情况下nawk，答案是：不是不修改源代码。

其中两个问题是：

正则表达式是语言的一部分（~和//），以及定义的语言函数（match()等）
nawk使用自己的正则表达式代码（在文件中b.c），因此与使用一个正则表达式库的程序不同，使用具有替代实现的不同库regcomp() regexec()将无济于事。

解决此问题的一种方法是使用第三个参数gawk进行扩展。（你也注意到了，但我尽量避免它。）match()gensub()

gawk还支持可加载扩展，这将是一种与 PCRE 库接口以提供新的“内置”功能（虽然不是替换~或任何内部功能）的方式。这个 API 是新的“4.1”扩展方式，以前的版本有一个完全不同的实现。

最后，nawk实现所需替换的一种方法是：

match($0,/\|[^|]*\|/) {
    do {
        sub(/\|[^|]*\|/,"\\abs{" substr($0,RSTART+1,RLENGTH-2) "}",$0)
    } while (match($0,/\|[^|]*\|/))
}
{ print }

score 1 · Accepted Answer

这是我使用拆分功能的基于 nawk 的解决方案：

awk '{
   split($0, arr, "|");
   for (i=1; i<=length(arr); i++) {
      if (i%2)
         printf("%s", arr[i]);
      else
         printf("\\abs{%s}", arr[i]);
   }
   printf("%s", ORS)
}' file

输出：

$\abs{abs}$ so on and so fourth
$$\abs{a}+\abs{b}\geq\abs{a+b}$$
who is affraid of wolf $\abs{abs}$

regex - Using an external regex library from AWK

2 回答 2

现场演示：http: //ideone.com/lMf2hL

regex - Using an external regex library from AWK

2 回答 2

现场演示：http: //ideone.com/lMf2hL

Related

Reference