这是一个关于CamelCase 正则表达式的问题。结合tchrist post我想知道什么是正确的utf-8 CamelCase。
从 (brian d foy's) 正则表达式开始:
/
\b # start at word boundary
[A-Z] # start with upper
[a-zA-Z]* # followed by any alpha
(?: # non-capturing grouping for alternation precedence
[a-z][a-zA-Z]*[A-Z] # next bit is lower, any zero or more, ending with upper
| # or
[A-Z][a-zA-Z]*[a-z] # next bit is upper, any zero or more, ending with lower
)
[a-zA-Z]* # anything that's left
\b # end at word
/x
并修改为:
/
\b # start at word boundary
\p{Uppercase_Letter} # start with upper
\p{Alphabetic}* # followed by any alpha
(?: # non-capturing grouping for alternation precedence
\p{Lowercase_Letter}[a-zA-Z]*\p{Uppercase_Letter} ### next bit is lower, any zero or more, ending with upper
| # or
\p{Uppercase_Letter}[a-zA-Z]*\p{Lowercase_Letter} ### next bit is upper, any zero or more, ending with lower
)
\p{Alphabetic}* # anything that's left
\b # end at word
/x
标有“###”的行有问题。
另外,假设数字和下划线等价于小写字母时如何修改正则表达式,因此 W2X3 是有效的 CamelCase 单词。
更新:(ysth评论)
接下来,
any
: 意思是“大写或小写或数字或下划线”
正则表达式应匹配 CamelWord、CaW
- 以大写字母开头
- 可选 任何
- 小写字母或数字或下划线
- 可选 任何
- 大写字母
- 可选 任何
请不要标记为重复,因为它不是。最初的问题(以及答案)只考虑 ascii。