1

我想知道为什么以下正则表达式:

\b\w{7}\b\s[1]\s[\S\s]+?(?=WHAT WHERE WHAT WHERE WHAT\,\sWHERE\sWHAT.)

和:

\b\w{7}\b\s[1]\s[\S\s]+?(?=WHAT WHERE WHAT WHERE WHAT\,\sWHERE\sWHAT.|HOW WHO HOW WHO HOW\,\sWHO\sHOW\.)

似乎在以下测试字符串上工作得很好:

THIS THAT THIS THAT THIS,
THAT
THIS.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

WHAT WHERE WHAT WHERE WHAT,
WHERE
WHAT.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

HOW WHO HOW WHO HOW,
WHO
HOW.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

IF OR IF OR IF.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

TO FOR TO FOR
TO FOR TO FOR.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

IN UNDER IN
UNDER IN UNDER.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

LEFT RIGHT LEFT
RIGHT LEFT.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

UP DOWN UP DOWN UP
DOWN.

CHAPTER 1

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 2

Text text text 2 text text text 3 text text text 4 text text text.

CHAPTER 3

Text text text 2 text text text 3 text text text 4 text text text.

THE END.

但是,当我对超过 5MB 的文件使用相同类型的表达式时,它会失败。

我正在使用的 VBScript 如下:

Option Explicit

Dim strPath : strPath = "myFile.txt"

If Instr(1, WScript.FullName, "CScript", vbTextCompare) = 0 Then
    With CreateObject("WScript.Shell")
        .Run "cmd.exe /k cscript //nologo """ & WScript.ScriptFullName & """", 1, False
        WScript.Quit
    End With
Else
    With CreateObject("Scripting.FileSystemObject")
        If .FileExists(strPath) Then 
            Call Main(strPath)
        Else
            WScript.Echo "Input file doesn't exists"
        End If
    End With
End If

Private Sub Main(filePath)
    Dim TempDictionary, Books, Book, b
    Set TempDictionary = CreateObject("Scripting.Dictionary")
    Set Books = RegEx(GetFileContent(filePath),"\b\w{7}\b\s[1]\s[\S\s]+?THE SECOND BOOK OF MOSES")
    If Books.Count > 0 Then 
        For Each Book In Books 
            WScript.Echo Replace(Left(Book.Value,70),vbCrLf," ")
        Next 
    Else 
        WScript.Echo "Document didn't contain any valid books" 
        WScript.Quit 
    End If 
End Sub

Private Function GetFileContent(filePath)
    Dim objFS, objFile, objTS
    Set objFS = CreateObject("Scripting.FileSystemObject")
    Set objFile = objFS.GetFile(filePath)
    Set objTS = objFile.OpenAsTextStream(1, 0)
    GetFileContent = objTS.Read(objFile.Size)
    Set objTS = Nothing
End Function

Private Function RegEx(str,pattern)
    Dim objRE, Match, Matches
    Set objRE = New RegExp
    objRE.Pattern = pattern
    objRE.Global = True
    Set RegEx = objRE.Execute(str)
    WScript.Echo objRE.Test(str)
End Function

我正在使用的编辑器在这里:http ://www.regexr.com/

问:你想做什么?

答:我希望能够基于捕获两个字符串之间任何内容的智能正则表达式代码将任何文本文件拆分为多个字符串块。第一个字符串确定器是固定术语,即“CHAPTER 1”,但第二个字符串确定器是不固定的。第二个字符串确定器是不固定的和变化的,但它是已知的。可以将其放入数组中,然后进行解析。我遇到的问题是 Lookaround (?=) 似乎要么逃脱要么陷入循环。我一直在玩“|” 运算符,正如您在此 OP 开头的第二个 RegEx 中看到的那样。我正在使用的测试文件似乎解析得很好。没问题。但是我正在使用的更大的文件......我不知道。只是出了点问题。

4

0 回答 0