regex - 用于在 vb.net 中的 2 条评论之间提取 html 的正则表达式代码不起作用

Question

我正在尝试在 2 条评论之间提取部分 html。

这是测试代码：

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

以上工作。

当我尝试从磁盘加载实际数据时，以下代码失败。

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

HTML 文件包含开始和结束注释以及中间的大量 HTML。HTML 文件中的某些内容是阿拉伯语。

感谢和问候。

score 2 · Accepted Answer

尝试像这样RegexOptions.Singleline传入Regex.Match(...)：

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)

这将使点的.匹配换行符。

score 0 · Accepted Answer

我不知道vb.net，但是否.匹配换行符或者您必须为此设置一个选项？考虑使用[\s\S]而不是.包含换行符。

regex - 用于在 vb.net 中的 2 条评论之间提取 html 的正则表达式代码不起作用

2 回答 2

Related

Reference