我是一个非常缺乏经验的 Python 编码器,所以我很有可能以完全错误的方式解决这个特定问题,但我很感激任何建议/帮助。
我有一个 Python 脚本,它逐行遍历 Markdown 文件并重写[[wikilinks]]
为标准 Markdown[wikilink](wikilink)
样式链接。我在一个函数中使用了两个正则表达式,如下所示:
def modify_links(file_obj):
"""
Function will parse file contents (opened in utf-8 mode) and modify standalone [[wikilinks]] and in-line
[[wikilinks]](wikilinks) into traditional Markdown link syntax.
:param file_obj: Path to file
:return: List object containing modified text. Newlines will be returned as '\n' strings.
"""
file = file_obj
linelist = []
logging.debug("Going to open file %s for processing now.", file)
try:
with open(file, encoding="utf8") as infile:
for line in infile:
linelist.append(re.sub(r"(\[\[)((?<=\[\[).*(?=\]\]))(\]\])(?!\()", r"[\2](\2.md)", line))
# Finds references that are in style [[foo]] only by excluding links in style [[foo]](bar).
# Capture group $2 returns just foo
linelist_final = [re.sub(r"(\[\[)((?<=\[\[)\d+(?=\]\]))(\]\])(\()((?!=\().*(?=\)))(\))",
r"[\2](\2 \5.md)", line) for line in linelist]
# Finds only references in style [[foo]](bar). Capture group $2 returns foo and capture group $5
# returns bar
except EnvironmentError:
logging.exception("Unable to open file %s for reading", file)
logging.debug("Finished processing file %s", file)
return linelist_final
这适用于大多数 Markdown 文件。但是,我偶尔会得到一个包含[[wikilinks]]
在受保护代码块中的 Markdown 文件,例如:
# Reference
Here is a reference to “the Reactome Project” using smart quotes.
Here is an image: 
[[201802150808]](Product discovery)
```
[[201802150808 Product Prioritization]]
def foo():
print("bar")
```
在上述情况下,我应该跳过处理[[201802150808 Product Prioritization]]
围栏代码块内部。我有一个正确识别围栏代码块的正则表达式,即:
(?<=```)(.*?)(?=```)
但是,由于现有函数是逐行运行的,因此我无法找到一种方法来跳过 for 循环中的整个部分。我该怎么做呢?