< # literally just an opening tag followed by a space
( # the bracket opens a subpattern, it's necessary as a boundary for
# the | later on
?: # makes the just opened subpattern non-capturing (so you can't access it
# as a separate match later
" # literally "
[^"] # any character but " (this is called a character class)
* # arbitrarily many of those (as much as possible)
" # literally "
['"] # either ' or "
* # arbitrarily many of those (and possible alternating! it doesn't have
# to be the same character for the whole string)
| # OR
' # literral *
[^'] # any character but ' (this is called a character class)
* # arbitrarily many of those (as much as possible)
' # literally "
['"]* # as above
| # OR
[^'">] # any character but ', ", >
) # closes the subpattern
+ # arbitrarily many repetitions but at least once
> # closing tag
请注意,正则表达式中的所有空格都被视为与任何其他字符一样。它们恰好匹配一个空格。
还要特别注意^
字符类开头的 。它不被视为单独的字符,而是反转整个字符类。
我也可以(强制性地)补充说,正则表达式不适合解析 HTML。