regex - 基于正则表达式匹配提取子字符串

Question

快速正则表达式问题（我希望）。

我需要根据正则表达式从任何字符串中识别一个子字符串。

例如，采用以下字符串：

"Blogs, Joe (S0003-000292).html"
"bla bla bla S0003-000292 & so on"
"RE: S0003-000292"

我需要提取“S0003-000292”部分（如果未找到则标记异常）。

至于我尝试过的，好吧，我写了一个粗略的模式来识别 S0000-000000：

^\(S[0-9]{4}-[0-9]{6}\)$

我已经尝试对其进行如下测试：

Dim regex As New Regex("Blogs, Joe (S0003-000292) Lorem Ipsum!")
Dim match As Match = regex.Match("^S[0-9]{4}-[0-9]{6}$")

If match.Success Then
    console.writeline "Found: " & match.Value
Else
    console.writeline "Not Found"
End If

但是，这始终会导致 Not Found。

所以，真的有两个问题，我的模式有什么问题以及如何使用修改后的模式来提取子字符串？

（使用.net 2）

编辑： stema 为我指出了正确的方向（即删除 ^ 和 $） - 但是这并没有解决问题，我的主要问题是我在 RegEx 构造函数中定义了字符串而不是模式 - 交换了这些并它工作得很好（我责怪缺乏咖啡因）：

Dim regex As New Regex("S[0-9]{4}-[0-9]{6}")
Dim match As Match = regex.Match("Joe, Blogs (S0003-000292).html")

If match.Success = True Then
    console.writeline "Found: " & match.Value
Else
    console.writeline "Not Found"
End If

score 7 · Accepted Answer

您有锚点可以防止您的模式匹配

^\(S[0-9]{4}-[0-9]{6}\)$
^                      ^

^匹配字符串的开头

$匹配字符串的结尾

并且由于您要匹配的部分之前和之后还有其他内容，因此您的模式将不匹配。只需删除那些锚，它应该没问题。

或者改用单词边界

\bS[0-9]{4}-[0-9]{6}\b

\b如果您的模式前后有一个“非单词”字符（非字母或数字），则将匹配。

score 0 · Accepted Answer

这是可以帮助您的代码注意：我用 c# 编写

Regex reg  = new Regex("(.)*S[0-9]{4}-[0-9]{6}(.)*");
string str = "Blogs, Joe (S0003-000292) Lorem Ipsum!";
Console.WriteLine(reg.IsMatch(str));
Console.ReadLine();

score 0 · Accepted Answer

Dim reg as new Regex("(.)*S[0-9]{4}-[0-9]{6}(.)*")
Dim str as new string("Blogs, Joe (S0003-000292) Lorem Ipsum!")
MessageBox.show(reg.IsMatch(str))


I am not sure about syntax but this may be a right conversion of my c# code.

regex - 基于正则表达式匹配提取子字符串

3 回答 3

Related

Reference