c# - 正则表达式帮助查找文本未在 xml 中链接

Question

我需要一些正则表达式帮助在 c# 中查找非链接章节

在下面的示例中，第 7 章已链接，但第 6 章和第 II 章未链接我想找到未链接的人（代码中列出了其他一些情况）。

xml 示例：

...
<p class="text_noindent"><a id="page_47"/>Much of this will
be explained further in the <a xref="ch007">chapter 7</a>context of the charity fashion
show described in Chapter 6. Chapters II</p>
...

我找到这个的代码是

Regex.Matches(chk.Replace("(", "").Replace(")", ""), "[^<>/\"]\\s*(figure|table|fig.|tab.|chapters|chapter|chap.|cap.|part|figures|tables|chapters|figs.|tabs.)\\s[0-9]+[^a-zA-Z0-9]", RegexOptions.IgnoreCase);
Regex.Matches(chk.Replace("(", "").Replace(")", ""), "[^<>/\"]\\s*(figure|table|fig.|tab.|chapters|chapter|chap.|cap.|part|figures|tables|chapters|figs.|tabs.)\\s(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})[^a-zA-Z0-9]", RegexOptions.IgnoreCase);

但它的选择选项，如 scape 1、stab stable ... 等等，任何人也可以建议我最好的解决方案

score 0 · Accepted Answer

使用正则表达式时，您应该像这样使用@：

String _s = @"\s*";

例如。

我让您相应地更改自己的代码。

score 0 · Accepted Answer

选择整个单词的更好选择是用包围模式\b，像这样

\b(chap|chapter|etc)\s+[0-9]+\b

这也将排除标点符号等，因此您无需排除[^<>"].

score 0 · Accepted Answer

很可能您正在尝试使用匹配空格\s*，但 Visual Studio 向您显示错误“无法识别的转义序列”，因此您以这种方式对其进行了转义\\s*，这意味着完全不同。尝试使用[ ]*?或只使用空格

c# - 正则表达式帮助查找文本未在 xml 中链接

3 回答 3

Related

Reference