python - 从 Python 中的字符串中删除 Wiki 标记

Question

我有一个包含从 Wikia 页面下载的信息的字符串。

为了解析其内容，我将如何从页面中删除所有 Wiki 格式，只留下原始文本？

以下是可能出现的示例：

#REDIRECT[[Blah]]

{{
I have some stuff in here
}}
[[I also have some stuff in here|and here]]
[[http://blehthisisfake.com Link to a fake website]]

&lt;span class="plainlinks"&gt;This is quite useless. Why was [[this page]] even created?&lt;/span&gt;

&lt;nowiki&gt;There are more HTML tags, they should probably all be stripped...&lt;/nowiki&gt;

There is random text in here. bleh bleh bleh

I'm not sure what single [brackets] do, but they should be stripped too...

预期输出：

这里有随机文本。呜呜呜

我不确定单身做什么，但他们也应该被剥夺......

有没有可以做到这一点的模块？

score 3 · Accepted Answer

A Google search for "python wiki parser" turns up this code, which strips and replaces the tags (see the source code in the link for details).

python - 从 Python 中的字符串中删除 Wiki 标记

1 回答 1

Related

Reference