regex - 使用 sed 擦除 bibtex 条目中的字段

Question

我面临一个包含多个像这样的 bibtex 实例的文本文件

@article{Lindgren1989Resonant,
    abstract = {Using a simple model potential, a truncated image barrier, for the
Al(111) surface, one obtains a resonant bound surface state at an energy
that agrees surprisingly well with recent observations by inverse
photoemission.},
    author = {Lindgren and Walld\'{e}n, L.},
    citeulike-article-id = {9286612},
    citeulike-linkout-0 = {http://dx.doi.org/10.1103/PhysRevB.40.11546},
    citeulike-linkout-1 = {http://adsabs.harvard.edu/cgi-bin/nph-bib\_query?bibcode=1989PhRvB..4011546L},
    doi = {10.1103/PhysRevB.40.11546},
    journal = {Phys. Rev. B},
    keywords = {image-potential, surface-states},
    month = dec,
    pages = {11546--11548},
    posted-at = {2011-05-12 11:42:49},
    priority = {0},
    title = {Resonant bound states for simple metal surfaces},
    url = {http://dx.doi.org/10.1103/PhysRevB.40.11546},
    volume = {40},
    year = {1989}
}

我想擦除抽象字段，它可以跨越一个或多个（如上述情况）行。我尝试以以下方式使用 sed

sed "/^\s*${field}.*=/,/},?$/{
    d
}" file

其中 file 是包含上述 bibtex 代码的文本文件。但是，这个命令的输出只是

@article{Lindgren1989Resonant,

显然 sed 与最后的 } 匹配，但我如何让它匹配抽象值的右括号？

score 2 · Accepted Answer

这可能对您有用：

sed '1{h;d};H;${x;s/\s*abstract\s*=\s*{[^}]*}\+,//g;p};d' file

这会将整个文件放入保存空间，然后删除abstract字段

解释：

在第一行用当前行替换保持空间 (HS)，将所有后续行附加到 HS。遇到最后一行时，切换到 HS 并替换所有出现的抽象字段，然后打印文件。注意，所有正常打印出来的行都会被删除。

score 1 · Accepted Answer

这条 awk 线对你有用吗？

 awk '/abstract *= *{/{a=1} (a && /} *,$/){a=0;next;}!a' yourInput

score 1 · Accepted Answer

sed 中的地址以一种奇怪的方式匹配：

addr2 可以在 addr1 之前匹配，这是您在表达时遇到的情况！使用多个块。

regex - 使用 sed 擦除 bibtex 条目中的字段

3 回答 3

Related

Reference