这里的一些解决方案需要参考 MS Word 对象库。玩弄我的牌,我找到了一个不依赖它的解决方案。它在 VBA 中去除 RTF 标记和其他绒毛,如字体表和样式表。它可能对你有帮助。我在您的数据中运行它,除了空格之外,我得到的输出与您预期的相同。
这是代码。
首先,检查字符串是否为字母数字。给它一个长度为一个字符的字符串。该函数用于在这里和那里进行定界。
Public Function Alphanumeric(Character As String) As Boolean
If InStr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-", Character) Then
Alphanumeric = True
Else
Alphanumeric = False
End If
End Function
接下来是删除整个组。我用它来删除字体表和其他垃圾。
Public Function RemoveGroup(RTFString As String, GroupName As String) As String
Dim I As Integer
Dim J As Integer
Dim Count As Integer
I = InStr(RTFString, "{\" & GroupName)
' If the group was not found in the RTF string, then just return that string unchanged.
If I = 0 Then
RemoveGroup = RTFString
Exit Function
End If
' Otherwise, we will need to scan along, from the start of the group, until we find the end of the group.
' The group is delimited by { and }. Groups may be nested, so we need to count up if we encounter { and
' down if we encounter }. When that count reaches zero, then the end of the group has been found.
J = I
Do
If Mid(RTFString, J, 1) = "{" Then Count = Count + 1
If Mid(RTFString, J, 1) = "}" Then Count = Count - 1
J = J + 1
Loop While Count > 0
RemoveGroup = Replace(RTFString, Mid(RTFString, I, J - I), "")
End Function
好的,这个函数会删除所有标签。
Public Function RemoveTags(RTFString As String) As String
Dim L As Long
Dim R As Long
L = 1
' Search to the end of the string.
While L < Len(RTFString)
' Append anything that's not a tag to the return value.
While Mid(RTFString, L, 1) <> "\" And L < Len(RTFString)
RemoveTags = RemoveTags & Mid(RTFString, L, 1)
L = L + 1
Wend
'Search to the end of the tag.
R = L + 1
While Alphanumeric(Mid(RTFString, R, 1)) And R < Len(RTFString)
R = R + 1
Wend
L = R
Wend
End Function
我们可以用明显的方式删除花括号:
Public Function RemoveBraces(RTFString As String) As String
RemoveBraces = Replace(RTFString, "{", "")
RemoveBraces = Replace(RemoveBraces, "}", "")
End Function
将上述函数复制粘贴到模块中后,您可以创建一个函数,使用它们删除您不需要或不想要的任何东西。以下在我的情况下非常有效。
Public Function RemoveTheFluff(RTFString As String) As String
RemoveTheFluff = Replace(RTFString, vbCrLf, "")
RemoveTheFluff = RemoveGroup(RemoveTheFluff, "fonttbl")
RemoveTheFluff = RemoveGroup(RemoveTheFluff, "colortbl")
RemoveTheFluff = RemoveGroup(RemoveTheFluff, "stylesheet")
RemoveTheFluff = RemoveTags(RemoveBraces(RemoveTheFluff))
End Function
我希望这有帮助。我不会在文字处理器或任何东西中使用它,但如果你正在这样做,它可能会用于抓取数据。