0

我正在尝试在 2 条评论之间提取部分 html。

这是测试代码:

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = start_comment & "some more html text" & end_comment 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

以上工作。

当我尝试从磁盘加载实际数据时,以下代码失败。

Sub Main()

    Dim base_dir As String = "D:\"
    Dim test_file As String = base_dir & "72.htm"

    Dim start_comment As String = "<!-- start of content -->"
    Dim end_comment As String = "<!-- end of content -->"

    Dim regex_pattern As String = start_comment & ".*" & end_comment
    Dim input_text As String = System.IO.File.ReadAllText(test_file).Replace(vbCrLf, "") 

    Dim match As Match = Regex.Match(input_text, regex_pattern)


    If match.Success Then
        Console.WriteLine("found {0}", match.Value)
    Else
        Console.WriteLine("not found")
    End If

    Console.ReadLine()

End Sub

HTML 文件包含开始和结束注释以及中间的大量 HTML。HTML 文件中的某些内容是阿拉伯语。

感谢和问候。

4

2 回答 2

2

尝试像这样RegexOptions.Singleline传入Regex.Match(...)

Dim match As Match = Regex.Match(input_text, regex_pattern, RegexOptions.Singleline)

这将使点的.匹配换行符。

于 2012-04-07T00:51:30.780 回答
0

我不知道vb.net,但是否.匹配换行符或者您必须为此设置一个选项?考虑使用[\s\S]而不是.包含换行符。

于 2012-04-07T00:34:57.777 回答