awk - 根据公共字符串的出现重新索引两位数字字符串

Question

我有一个urlwatch .yaml具有这种格式的文件：

name: 01_urlwatch update released
url: "https://github.com/thp/urlwatch/releases"
filter:
  - xpath:
      path: '(//div[contains(@class,"release-timeline-tags")]//h4)[1]/a'
  - html2text: re
---
name: 02_urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current\sversion  #\s Matches a whitespace character
  - strip # Strip leading and trailing whitespace 
---
name: 04_RansomWhere? Objective-See
url: "https://objective-see.com/products/ransomwhere.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #\s Matches a whitespace character
  - strip #Strip leading and trailing whitespace
---
name: 05_BlockBLock Objective-See
url: "https://objective-see.com/products/blockblock.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #(?i) \s 
  - strip #Strip leading and trailing whitespace
---

我需要根据name: . 在此示例中，第一次和第二次出现的name: 后面是正确的索引号，但第三次和第四次不是。

在上面的示例中，第三次和第四次出现的name: 索引号将重新索引到03_文本04_字符串之前。即：一个两位数的索引号和一个下划线。

此外，此字符串的某些实例#name: 不应计入重新索引。（它们已被注释掉，因此这些行不会被执行urlwatch）

我尝试使用 sed 但无法根据字符串的出现生成索引号。我没有 GNU sed，但如果这是唯一的方法，我可以安装。

score 3 · Accepted Answer

我觉得这样可以

awk '/^name: / { sub(/[0-9]{2}/, ++i); sub(/ [1-9][^0-9]/,"\x0&"); sub(/\x0 /," 0") }; 1' your_input

在以开头的每一行上，name: 我们将两位数（如果只有一位数字，则使用另一个替换我们标记该行，并且使用第三个替换，我们添加一个前导 0 并删除该标记。[0-9]{2}i

可能它有点脆弱，但鉴于你的解释，它看起来不错。

score 3 · Accepted Answer

这可能对您有用（GNU sed）：

sed -E '/^name:/{x;s/.*/expr & + 1/e;s/^.$/0&/;x;G;s/[0-9]+(.*)\n(.*)/\2\1/}' file

在行开始匹配name:，在保持空间中增加一个计数器，将保持空间附加到模式空间，匹配第一组整数并使用捕获的组替换计数器。

score 2 · Accepted Answer

awk '/^name/{sub(/[0-9]{2}/,sprintf("%02d", ++c))}1' file

对于任何以“name”开头的行，我们将第一个 2 位数字替换为我们的计数器，每次出现时它都会递增，在sprintf需要时在 GNU awk 函数的帮助下用前导零打印它。

awk - 根据公共字符串的出现重新索引两位数字字符串

3 回答 3

Related

Reference