这个问题非常适合正则表达式。以下函数返回给定字符串中简单正则表达式模式的第一个匹配之前字符的位置。如果未找到匹配项,则该函数返回字符串的长度。该函数可以与 LEFT 函数结合使用以提取匹配前的文本。(使用 LEFT 是必要的,因为为简单起见,此函数不实现子匹配。)
以下公式将提取样本数据中的产品名称:
=LEFT(A1,regexmatch(A1," \(|\/| -| \*"))
分解匹配模式" \(|\/| -| \*"
:
" \(" matches a space followed by a left parenthesis
[the backslash escapes the "(", a special character in regular expressions]
"|" signifies an alternative pattern to match
"\/" matches a forward slash (/)
" -" matches a space followed by a dash (-)
" \*" matches a space followed by an asterisk (*).
要了解有关正则表达式的更多信息,请参阅此正则表达式教程,这是网络上提供的众多教程之一。
为了使该功能正常工作,您需要设置对 Microsoft VBScript 正则表达式 5.5 的引用。为此,请从 VBA IDE 中选择 Tools/References 并选中此项,这将在长长的引用列表中很靠后。
Function regexMatch(text As String, rePattern As String)
'Response to SO post 16591260
'Adapted from code at http://www.macrostash.com/2011/10/08/
' simple-regular-expression-tutorial-for-excel-vba/.
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches As Variant
regEx.pattern = rePattern
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = False 'Return just the first match
If regEx.Test(text) Then
Set matches = regEx.Execute(text)
regexMatch = matches(0).FirstIndex
Else
regexMatch = Len(text)
End If
End Function
以下子例程将字符串提取应用于指定数据列中的每个单元格,并将新字符串写入指定的结果列。尽管可以只为数据列中的每个单元格调用该函数,但这会在每次调用该函数时产生编译正则表达式(适用于所有单元格)的开销。为了避免这种开销,子例程将匹配函数分成两部分,循环外通过数据单元的模式定义,以及循环内的模式执行。
Sub SubRegexMatch()
'Response to SO post 16591260
'Extracts from string content of each data cell in a specified source
' column of the active worksheet the characters to the left of the first
' match of a regular expression, and writes the new string to corresponding
' rows in a specified result column.
'Set the regular expression, source column, result column, and first
' data row in the "parameters" section
'Regex match code was adapted from http://www.macrostash.com/2011/10/08/
' simple-regular-expression-tutorial-for-excel-vba/
Dim regEx As New VBScript_RegExp_55.RegExp, _
matches As Variant, _
regexMatch As Long 'position of character *just before* match
Dim srcCol As String, _
resCol As String
Dim srcRng As Range, _
resRng As Range
Dim firstRow As Long, _
lastRow As Long
Dim srcArr As Variant, _
resArr() As String
Dim i As Long
'parameters
regEx.Pattern = " \(|\/| -| \*" 'regular expression to be matched
regEx.IgnoreCase = True
regEx.Global = False 'return only the first match found
srcCol = "A" 'source data column
resCol = "B" 'result column
firstRow = 2 'set to first row with data
With ActiveSheet
lastRow = .Cells(Cells.Rows.Count, srcCol).End(xlUp).Row
Set srcRng = .Range(srcCol & firstRow & ":" & srcCol & lastRow)
Set resRng = .Range(resCol & firstRow & ":" & resCol & lastRow)
srcArr = srcRng
ReDim resArr(1 To lastRow - firstRow + 1)
For i = 1 To srcRng.Rows.Count
If regEx.Test(srcArr(i, 1)) Then
Set matches = regEx.Execute(srcArr(i, 1))
regexMatch = matches(0).FirstIndex
Else
regexMatch = Len(srcArr(i, 1)) 'return length of original string if no match
End If
resArr(i) = Left(srcArr(i, 1), regexMatch)
Next i
resRng = WorksheetFunction.Transpose(resArr) 'assign result to worksheet
End With
End Sub