通常你会为此使用 Internet Explorer COM 对象:
root = "C:\base\dir"
Set ie = CreateObject("InternetExplorer.Application")
For Each f In fso.GetFolder(root).Files
ie.Navigate "file:///" & f.Path
While ie.Busy : WScript.Sleep 100 : Wend
text = ie.document.getElementById("MySection").innerText
WScript.Echo Replace(text, vbNewLine, "")
Next
但是,在<section>
IE 9 之前不支持该标签,即使在 IE 9 中,COM 对象似乎也不能正确处理它,因为getElementById("MySection")
只返回开始标签:
>>> wsh.echo ie.document.getelementbyid("MySection").outerhtml
<SECTION id=MySection>
不过,您可以改用正则表达式:
root = "C:\base\dir"
Set fso = CreateObject("Scripting.FileSystemObject")
Set re1 = New RegExp
re1.Pattern = "<section id=""MySection"">([\s\S]*?)</section>"
re1.Global = False
re2.IgnoreCase = True
Set re2 = New RegExp
re2.Pattern = "(<br>|\s)+"
re2.Global = True
re2.IgnoreCase = True
For Each f In fso.GetFolder(root).Files
html = fso.OpenTextFile(filename).ReadAll
Set m = re1.Execute(html)
If m.Count > 0 Then
text = Trim(re2.Replace(m.SubMatches(0).Value, " "))
End If
WScript.Echo text
Next