1

那可能吗?可能不是?然后我怎样才能找到匹配的所有确切出现和相应的页码?

编辑:

我的正则表达式工作正常。我需要的是每场比赛都得到它出现的所有页面。

例子:

regex = \b\d{3}\b

123 appears on page 1,4,20
243 appear on page 3,5,7
523 appears on page 9

我如何获得该信息(匹配发生的所有页面?)

这是为了自动创建某种索引。

编辑2:

我有一个基本的工作版本,片段:

Set Matches = regExp.Execute(ActiveDocument.range.Text)

For Each Match In Matches    
    Set range = ActiveDocument.range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value))    
    page = range.Information(wdActiveEndAdjustedPageNumber)

问题是 Match.FirstIndex 并不总是指向 ActiveDocument.range 中匹配的第一个字符。Word 表把这搞砸了,因为 ActiveDocument.range.Text 包含不在文本中的字符,它们代表表中的某些内容。

4

2 回答 2

3

我认为这可能更适合超级用户。

问题的答案是“是”。

Selection.Information(wdActiveEndAdjustedPageNumber)

VBA 中的上述属性将为您提供选择的页码。

此外,VBA 可以做一些正则表达式的工作

于 2013-01-09T15:35:45.847 回答
3

结果证明这相当复杂,我不能说我的解决方案是否适用于任何文档。主要问题如问题所示,无法使用 RegexMatch.FirstIndex 来确定实际匹配是否在 MS Word 文档中。这是因为正则表达式匹配是在 range.Text 属性(字符串)上完成的,并且该字符串仅包含与范围对象不同数量的字符,因此索引不匹配。

所以我的解决方案是针对每个匹配项,我在整个文档中为该匹配项进行查找。find 方法给出了一个 Range 对象,从中可以确定正确的页面。

在我的特殊情况下,匹配可能是相同的东西,也可能是不同的值。示例:343在我的情况下将与Prefix-343. 第二个问题是必须对匹配项进行排序,例如123324无论哪个匹配项首先出现在文档中。

如果您需要排序功能,您还需要以下“模块”:

排序字典功能:

http://www.cpearson.com/excel/CollectionsAndDictionaries.htm

模块“modQSortInPlace”:

http://www.cpearson.com/Zips/modQSortInPlace.zip

如果不需要排序,则不需要它们,但需要SortDictionary Dict, True从我的代码中删除相应的函数调用。

现在到我的代码。您可以删除某些部分,尤其是格式化部分。这是针对我的情况的。另外,如果您的匹配是“独特的”,例如。不是前缀,因此您也可以简化代码。您将需要参考“Microsoft 脚本库”。

Option Explicit

Sub ExtractRNumbers()

    Dim Dict As Scripting.Dictionary
    Set Dict = CreateObject("Scripting.dictionary")

    Dim regExp, Match, Matches
    Dim rNumber As String
    Dim range As range

    Set regExp = CreateObject("VBScript.RegExp")
    regExp.Pattern = "\b(R-)?\d{2}-\d{4,5}(-\d)?\b"
    regExp.IgnoreCase = False
    regExp.Global = True

    ' determine main section, only extract R-Numbers from main section
    ' and not the Table of contents as example
    ' main section = section with most characters

    Dim section As section
    Dim maxSectionSize As Long
    Dim sectionSize As Long
    Dim sectionIndex As Integer
    Dim currentIndex As Integer
    maxSectionSize = 0
    currentIndex = 1
    For Each section In ActiveDocument.Sections
        sectionSize = Len(section.range.text)
        If sectionSize > maxSectionSize Then
            maxSectionSize = sectionSize
            sectionIndex = currentIndex
        End If
        currentIndex = currentIndex + 1
    Next


    Set Matches = regExp.Execute(ActiveDocument.Sections(sectionIndex).range.text)


    For Each Match In Matches

        ' If the Document contains Tables, ActiveDocument.range.Text will contain
        ' BEL charachters (chr(7)) that probably define the table structure. The issue
        ' is that then Match.FirstIndex does not point to the actual first charachter
        ' of a Match in the Document.
        ' Also there are other things (unknwon) that lead to the same issue, eg.
        ' Match.FirstIndex can not be used to find the actual "matching word" within the
        ' document. Because of that below commented apporach does not work on a generic document

        '   Set range = ActiveDocument.range(Match.FirstIndex, Match.FirstIndex + Len(Match.Value))
        '   page = range.Information(wdActiveEndAdjustedPageNumber)

        ' Maybe there is a simpler solution but this works more or less
        ' the exception beign tables again. see http://support.microsoft.com/kb/274003

        ' After a match is found the whole document is searched using the find method.
        ' For each find result the page number is put into an array (if it is not in the array yet)
        ' Then the match is formatted properly.
        ' After formatting, it is checked if the match was previously already found
        '
        '   If not, we add a new entry to the dictionary (key = formatted match, value = array of page numbers)
        '
        '   If match was already found before (but potentially in a different format! eg R-87-1000 vs 87-1000 as example),
        '   all additional pages are added to the already found pages.

        Set range = ActiveDocument.Sections(sectionIndex).range
        With range.Find
            .text = Match.Value
            .MatchWholeWord = True
            .MatchCase = True
            .Wrap = wdFindStop
        End With

        Dim page As Variant
        Dim pages() As Integer
        Dim index As Integer
        index = 0
        ReDim pages(0)

        Do While range.Find.Execute() = True
            page = range.Information(wdActiveEndAdjustedPageNumber)
            If Not IsInArray(page, pages) Then
                ReDim Preserve pages(index)
                pages(index) = page
                index = index + 1
            End If
        Loop

        ' FORMAT TO PROPER R-NUMBER: This is specific to my case
        rNumber = Match.Value
        If Not rNumber Like "R-*" Then
         rNumber = "R-" & rNumber
        End If
        ' remove possible batch number as r-number
        If Len(rNumber) > 11 Then
            rNumber = Left(rNumber, Len(rNumber) - 2)
        End If
        ' END FORMAT

        If Not Dict.Exists(rNumber) Then
            Dict.Add rNumber, pages
        Else
            Dim existingPages() As Integer
            existingPages = Dict(rNumber)
            For Each page In pages
                If Not IsInArray(page, existingPages) Then
                    ' add additonal pages. this means that the previous match
                    ' was formatted different, eg R-87-1000 vs 87-1000 as example
                    ReDim Preserve existingPages(UBound(existingPages) + 1)
                    existingPages(UBound(existingPages)) = page
                    Dict(rNumber) = existingPages
                End If
            Next
        End If

    Next
    'sort dictionary by key (R-Number)
    SortDictionary Dict, True
    Dim fso
    Set fso = CreateObject("Scripting.FileSystemObject")

    Dim stream
    ' Create a TextStream.
    Set stream = fso.CreateTextFile(ActiveDocument.Path & "\" & ActiveDocument.Name & "-rNumbers.txt", True)

    Dim key As Variant
    Dim output As String
    Dim i As Integer
    For Each key In Dict.Keys()
        output = key & vbTab
        pages = Dict(key)
        For i = LBound(pages) To UBound(pages)
            output = output & pages(i) & ", "
        Next
        output = Left(output, Len(output) - 2)
        stream.WriteLine output        
    Next
    Set Dict = Nothing
    stream.Close
End Sub

Private Function IsInArray(page As Variant, pages As Variant) As Boolean
    Dim i As Integer
    IsInArray = False
    For i = LBound(pages) To UBound(pages)
        If pages(i) = page Then
            IsInArray = True
            Exit For
        End If
    Next
End Function
于 2013-01-10T13:40:09.017 回答