4

我正在尝试使用 Excel VBA 中的以下 RegEx 删除所有不可打印和非 ASCII(扩展)字符:

[^\x09\0A\0D\x20-\xFF]

这在理论上应该匹配任何不是制表符、换行符、回车或可打印的 ASCII 字符(十六进制 20 和 FF 或 dec 32 和 255 之间的字符代码)的任何内容。我在这里确认 Microsoft VBScript 正则表达式支持 \xCC 表示法,其中 CC 是十六进制的 ASCII 代码。

问题是这个正则表达式匹配高于 127 的每个字符。然后当匹配字符的代码高于 127 时,它会在 match.value 上抛出一个“无效的过程调用”。仅仅是 VBScript RegExes 不支持高于 127 的字符代码吗?我似乎无法在任何地方找到这些数据。这是完整的代码:

regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
    Set matches = regEx.Execute(Cells(curRow, curCol).Value)
    numReplacements = numReplacements + matches.Count
    For matchNum = matches.Count To 1 Step -1
        Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
        Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
    Next matchNum
    Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If

它匹配的第一个字符是 0x96 (&ndash)。当我观看“比赛”并展开它时,我可以在“观看”窗口中看到它。但是,当我尝试观看 match.Item(matchNum).Value 时,我得到了(见截图)。有任何想法吗?

4

1 回答 1

1

Microsoft VBScript regular expressions support the \xCC notation where CC is an ASCII code in hexadecimal

Note that ASCII is defined from \x00 to \x7F, where printable ASCII characters are from \x20 to \x7E.

Codes \x80 and above are Ansi, not ASCII.

Try next:

Dim ii, sExPatern: sExPatern = "[^\x09\x0A\x0D\x20-\x7E\"
For ii = 128 To 255
  sExPatern = sExPatern & Chr( ii)
Next
sExPatern = sExPatern & "]"
'...
regEx.Pattern = sExPatern

Honestly, I'm not sure on pritability of some codes, e.g. 129, 131, 136, 144, 152, 160 in decimal (my Ansi code page is "Windows Central Europe", so you may consider more detailed examination)

于 2014-06-25T14:19:38.797 回答