对,我要从我从 Wikipedia 下载的 xml 文件中删除一些引号。到目前为止文本看起来像这样(忽略换行符,这只是为了更容易阅读):
'''Anarchism''' is a political philosophy that advocates stateless societies based on
non-hierarchical free associations.<ref name="iaf-ifa.org"/><ref>"That is why
Anarchy, when it works to destroy authority in all its aspects, when it demands
the abrogation of laws and the abolition of the mechanism that serves to
impose them, when it refuses all hierarchical organization and preaches free agreement - at the same time strives to maintain and enlarge the precious kernel of social customs without which
no human or animal society can exist." Peter Kropotkin. http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin__Anarchism__its_philosophy_and_ideal.html
Anarchism: its philosophy and ideal</ref><ref>"anarchists are opposed to irrational (e.g., illegitimate)
authority, in other words, hierarchy - hierarchy being the institutionalisation of authority
within a society." http://www.theanarchistlibrary.org/HTML/The_Anarchist_FAQ_Editorial_Collective__An_Anarchist_FAQ__03_17_.html#toc2 "B.1
Why are anarchists against authority and hierarchy?" in An
Anarchist FAQ</ref><ref>"ANARCHISM, a social philosophy that rejects
authoritarian government and maintains that voluntary institutions are best
suited to express man's natural social tendencies." George Woodcock. "Anarchism" at The Encyclopedia of Philosophy</ref><ref>"In a society developed on these lines, the voluntary
associations which already now begin to cover all the fields of human activity
would take a still greater extension so as to substitute themselves for the
state in all its functions." http://www.theanarchistlibrary.org/HTML/Petr_Kropotkin___Anarchism__from_the_Encyclopaedia_Britannica.html
Peter Kropotkin. "Anarchism" from the Encyclopædia Britannica</ref> Anarchism holds the state
to be undesirable, unnecessary, or harmful
我想要从这段文字中得到的只是:
无政府主义是一种政治哲学,它提倡基于非等级制自由协会的无国籍社会。无政府主义认为国家是不受欢迎的、不必要的或有害的。
在我看来,如果我删除之间的所有文本"<ref"
,"/ref>"
我应该能够捕获所有需要的不需要的文本并将其删除。这是我目前的代码:
Dim temptext As String = newsrt.ToString
Dim expression As New Regex("(?<=\<ref)[^/ref>]+(?=/ref>)")
Dim resul As String = expression.Replace(temptext, "")
但这似乎不起作用。<ref
和之间没有文本/ref>
被捕获并替换为“”。
任何帮助或建议都会很棒!谢谢。