excel - 删除 Microsoft Excel 中特定标记之间的文本

Question

我有一些文字是这样的：

Lorem ipsum dolor <code>sit amet, consectetuer adipiscing elit,</code> sed diam nonummy nibh euismod tincidunt ut <code>laoreet dolore magna</code> aliquam erat volutpat。

我正在尝试删除每对“代码”标签之间的所有内容。我写了一个函数，当每个单元格只有一对标签时效果很好，但它不处理多个实例。这是所需的输出：

Lorem ipsum dolor <code></code> sed diam nonummy nibh euismod tincidunt ut <code></code> aliquam erat volutpat。

你会建议我怎么做？

score 1 · Accepted Answer

此 VBA 函数可用于去除打开和关闭 HTML 标记以及它们所包含的内容。它使用正则表达式，在这种有限的用法中应该没问题（但要小心使用正则表达式来解析 HTML）。

Function stripEnclosed(strIn As String) As String
Dim re As VBScript_RegExp_55.RegExp, AllMatches As VBScript_RegExp_55.MatchCollection, M As VBScript_RegExp_55.Match
Dim closeIndex As Long
tmpstr = strIn
Set re = New VBScript_RegExp_55.RegExp
re.Global = True
re.Pattern = "<[^/>]+>"
Set AllMatches = re.Execute(tmpstr)
For Each M In AllMatches
    closeIndex = InStr(tmpstr, Replace(M.Value, "<", "</"))
    If closeIndex <> 0 Then tmpstr = Left(tmpstr, InStr(tmpstr, M.Value) - 1) & Mid(tmpstr, closeIndex + Len(M.Value) + 1)
Next M
stripEnclosed = tmpstr
End Function

注意：您必须将“Microsoft VBScript 正则表达式 5.5”引用添加到您的 VBA 项目中。

如果您只想删除某个标签（例如<CODE>and </CODE>），只需将re.Pattern = "<[^/>]+>"代码行替换为以下内容：

re.Pattern = "<CODE>"

score 0 · Accepted Answer

基于宏记录器：

Sub Test()
    'working for selection replacing all <*> sections
    Selection.Replace What:="<*>", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
End Sub

在 OP 发表评论后，编辑尝试 2：

Sub Attempt_second()
    'working for selection replacing all <*> sections
    Selection.Replace What:="<*code>*<*/*code>", Replacement:="<code></code>", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
End Sub

它将替换文本以<code></code>删除其间的额外空格。

score -1 · Accepted Answer

KazJaw的答案简单、优雅，似乎可以满足您的需求。

我采取了完全不同的方法：

Public Function StripHTML(str As String) As String

Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
With RegEx
    .Global = True
    .IgnoreCase = True
    .MultiLine = True
    .Pattern = "<[^>]+>"
End With

StripHTML = RegEx.Replace(str, "")
Set RegEx = Nothing

End Function

excel - 删除 Microsoft Excel 中特定标记之间的文本

3 回答 3

Related

Reference