0

我有一个 .NET 应用程序,它使用 .NET Regex 功能来匹配 EPL 标签文本字符串。通常我会使用以下内容:^[A-Z0-9,]+"(.+)"$并且它将匹配每一行(它捕获 epl 代码之间的文本)。但是最近 EPL 发生了变化,并且在每条 EPL 行的末尾都有一个换行符\x0D\x0A

所以我将代码模式更改为[((\r\n)|(\x0D\x0A))A-Z0-9,]+"(.+)" 现在它只拾取儿童无法触及的地方并且不承认休息。

如何匹配 epl 代码之间的文本?

这是我要匹配的原始 EPL

N 0D0A A230,1,0,2,1,1,N,"远离儿童"0D0A A133,26,0,4,1,1,N,"呋塞米片剂 40 MG"0D0A A133,51 ,0,4,1,1,N,"早上一个"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N," "0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N, "19/04/13 28 平板电脑"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1, N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"DN54 5TZ,Tel:01424 503901"0D0A P1

4

2 回答 2

2

我认为您正在寻找RegexOptions.Multiline选项。如:

Regex myEx = new Regex("^[A-Z0-9,]+\".+?\"$", RegexOptions.Multiline);

实际上,正则表达式应该是:

"^[A-Z0-9,]+\".*\"\r?$"

Multiline寻找换行符,\n. 但该文件包含\r\n. 所以它找到了结束引号,看到了$,然后寻找换行符。但该文件具有 Windows 行结尾 ( \r\n)。如果它在那里,我修改过的正则表达式会跳过该字符。

如果要消除结果中的这些字符,请创建一个捕获组:

"^([A-Z0-9,]+\".*\")\r?$"

或者,您可以通过调用Trim每个结果来过滤它们:

MatchCollection matches = myEx.Matches(text);
foreach (Match m in matches)
{
    string s = m.Value.Trim();  // removes trailing \r
}
于 2013-07-22T14:13:47.633 回答
0

谢谢吉姆,我尝试了你的建议,它奏效了......

我使用了以下...

Dim sText As String = "N 0D0A A230,1,0,2,1,1,N,"Keep out of the reach of children"0D0A A133,26,0,4,1,1,N," FUROSEMIDE TABLETS 40 MG"0D0A A133,51,0,4,1,1,N," ONE IN THE MORNING"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N,""0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N,"19/04/13 28 TABLET(S)"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"CN54 1TZ,Tel:01424 503901"0D0A P1"
Dim sRet As String = String.Empty
Dim sTemp As String = String.Empty
Dim m As Match
Dim grp As System.Text.RegularExpressions.Group

Dim sPattern As String = "^([A-Z0-9,])+\"".*\""\r?$"
Dim sPatternRegex As New Regex(sPattern, RegexOptions.Multiline)
Dim matches As MatchCollection = sPatternRegex.Matches(sText)

For Each m In matches
   ' removes trailing \r
   'Dim s As String = m.Value.Trim()
    sTemp += m.Value.Trim() + vbCrLf
Next

' The previous code detects where the line feeds are, replaces the old one with a standard vbCrLF, then the following code parses it like normal

sPattern = "^[A-Z0-9,]+\""(.+)\""$" 

' Standard WinPrint EPL Label: The parsed version would appear as: ^[A-Z0-9,]+\"(.+)\"$

For Each s As String In sTemp.Split(vbCrLf)
   m = Regex.Match(s.Trim, sPattern)
   grp = m.Groups(1)
   sRet += grp.Value + vbCrLf
Next

Return sRet.Trim
于 2013-07-30T10:25:47.353 回答