1

我将用示例代码解释我所追求的。我的功能GetDox看起来很接近,但它仍然不完整。这是一个测试代码。

'test begin...
'<dox>
'  <member type="Public Sub" name="Increment" return="void">
'    <param type="Integer" name="nBase" out="true" />
'    <param type="Integer" name="nStep" out="false" />
'    <purpose>
'      purpose here...
'    </purpose>
'  </member>
'  <member ... />
'</dox>
'other comments here...
Public Sub Increment(nBase, nStep) 'some example content
    nBase = nBase + nStep
End Sub
'<Unwonted_Item />

Dim source  'reading the same file just for simplification
With CreateObject("Scripting.FileSystemObject")
    With .OpenTextFile(WScript.ScriptFullName, 1, False)
        source = .ReadAll
    End With
End With
result = GetDox(source)
WScript.Echo result  'display our result

Function GetDox(sCode)  'unfinished function
    Dim regEx, Match, Matches, mVal, sEnd
    sEnd = "</dox>" & vbNewLine
    Set regEx = New RegExp
    regEx.Pattern = "('<dox>\n|'\s*<.*)" 'my ugly pattern
    regEx.IgnoreCase = True
    regEx.Global = True
    Set Matches = regEx.Execute(sCode)
    For Each Match In Matches
        mVal = Match.Value
        mVal = Replace(mVal, vbCr, vbNewLine)
        mVal = Right(mVal, Len(mVal) - 1)
        GetDox = GetDox & mVal
        If mVal = sEnd Then Exit For
    Next
End Function

这就是我得到的:

<dox>
  <member type="Public Sub" name="Increment" return="void">
    <param type="Integer" name="nBase" out="true" />
    <param type="Integer" name="nStep" out="false" />
    <purpose>
    </purpose>
  </member>
  <member ... />
</dox>

这就是我需要的:

<dox>
  <member type="Public Sub" name="Increment" return="void">
    <param type="Integer" name="nBase" out="true" />
    <param type="Integer" name="nStep" out="false" />
    <purpose>
      purpose here...
    </purpose>
  </member>
  <member ... />
</dox>

缺少“这里的目的......”这一行,我知道整个RegExp.Pattern语法很弱。我只想选择<dox>以包含所有内容开头和结尾的整个内容</dox>,但我坚持使用模式语法。

PS 有了如此出色的帮助(感谢所有人),这是我现在的工作职能:

Function GetDox(sCode)
    GetDox = vbNullString
    With New RegExp
        .Pattern    = "<dox>[\s\S]*?</dox>"
        .IgnoreCase = True
        .Global     = False
        With .Execute(sCode)
            If .Count = 0 Then Exit Function
            GetDox  = .Item(0).Value
        End With
        .Pattern    = "^'"
        .Global     = True
        .Multiline  = True
        GetDox = .Replace(GetDox, "")
    End With
End Function
4

2 回答 2

2

我首先删除前导单引号:

regEx.Pattern = "^'"
regEx.Global  = True
sCode = regEx.Replace(sCode, "")

然后提取 XML 文本:

regEx.Pattern = "<dox>[\s\S]*?</dox>"
regEx.Global  = False
regEx.IgnoreCase = True
Set m = regEx.Execute(sCode)
If m.Count > 0 Then GetDox = m(0).Value

之后,您应该将 XML 读入DOM 树以进行进一步处理:

Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.loadXML result

如果您的 XML 在单独的文件中,您应该直接从文件中加载 XML 并使用XPath表达式提取节点,正如@FrankSchmitt 在他的评论中所建议的那样。

Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.load "C:\path\to\your.xml"

Set nodes = xml.selectNodes("//dox")

XML 不是面向行的,也不应该像这样解析。如果处理不当,事情很可能会以有趣的方式出现。

于 2013-03-17T10:53:52.633 回答
1

要修复您的代码,您可以使用这个正则表达式:('<dox>\n|'\s*[\S \t]*) demo

另一种方法是先获取您需要的所有内容,<dox>[\s\S]+?<\/dox>然后对其应用替换:
搜索:^'并替换为空

或者,清除前导空格:
Search: ^'\s*and replace with nothing demo

于 2013-03-17T04:31:37.337 回答