regex - 正则表达式中的语法错误以匹配链接 url

Question

我在一些 nemerle 代码中有以下方法：

private static getLinks(text : string) : array[string] {
        def linkrx = Regex(@"<a\shref=['|\"](.*?)['|\"].*?>");
        def m = linkrx.Matches(text);
        mutable txmatches : array[string];
        for (mutable i = 0; i < m.Count; ++i) {
            txmatches[i] = m[i].Value;
        }
        txmatches
    }

问题是编译器出于某种原因试图解析正则表达式语句中的括号并导致程序无法编译。如果我删除@，（我被告知放在那里）我在“\s”上得到一个无效的转义字符错误

下面是编译器输出：

NCrawler.n:23:21:23:22: ←[01;31merror←[0m: when parsing this `(' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:22:57:22:58: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:8:1:8:2: ←[01;31merror←[0m: when parsing this `{' brace group
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'
NCrawler.n:23:38:23:39: ←[01;31merror←[0m: unexpected closing bracket `]'

（第 23 行是上面带有正则表达式代码的行）

我该怎么办？

score 3 · Accepted Answer

我不知道 Nemerle，但似乎使用@禁用所有转义，包括".

尝试其中之一：

def linkrx = Regex("<a\\shref=['\"](.*?)['\"].*?>");

def linkrx = Regex(@"<a\shref=['""](.*?)['""].*?>");

def linkrx = Regex(@"<a\shref=['\x22](.*?)['\x22].*?>");

score 2 · Accepted Answer

我不是 Nemerle 程序员，但我知道你应该始终使用 XML 解析器来处理基于 XML 的数据，而不是正则表达式。

我猜有人已经为 Nemerle 创建了 DOM 或 XPath 库，所以你可以访问

//a[@href] 通过 XPath 或类似 a.href.value 通过 DOM。

例如，当前的正则表达式不喜欢

<a class="foo" href="something">bar</a>

我没有测试这个，但它应该更像它

/<a\s.+?href=['|\"]([^'\">]+)['|\"].+?>/i

score 1 · Accepted Answer

问题在于引号，而不是括号。在 Nemerle 中，与在 C# 中一样，您使用另一个引号而不是反斜杠来转义引号。

@"<a\shref=['""](.*?)['""].*?>"

编辑：还要注意，您不需要方括号内的管道；内容被视为一组字符（或字符范围），其中隐含了 OR。

regex - 正则表达式中的语法错误以匹配链接 url

3 回答 3

Related

Reference