javascript - 正则表达式：匹配下划线包裹的单词，除非它们以 @ / # 开头

Question

我正在尝试通过传入自定义正则表达式来解决 Tiptap（Vue 的 WYSIWYG 编辑器）中的这个错误_value_，以便在 Markdown ( ) 中标识斜体符号的正则表达式不会应用于以@or开头的字符串#，例如#some_tag_value不会转化为#some标签值。

到目前为止，这是我的正则表达式 -/(^|[^@#_\w])(?:\w?)(_([^_]+)_)/g
编辑：在 @ Wiktor Stribiżew 的帮助下新的正则表达式/(^|[^@#_\w])(_([^_]+)_)/g

虽然它满足大多数常见情况，但当下划线位于单词中间时，它目前仍然失败，例如 ant_farm_ 应该匹配（ant farm）

我还在这里https://regexr.com/50ibf提供了一些“应该匹配”和“不应该匹配”的案例，以便于测试

应该匹配（在下划线之间）

_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday

不应该匹配

@ta_g_
__value__
#some_tag_value
@some_value_here
@some_tag_
#some_val_
#_hello_

score 2 · Accepted Answer

对于科学来说，这种怪物在 Chrome（和 Node.js）中有效。

let text = `
<strong>Should match</strong> (between underscores)

_italic text here_
police_woman_
_fire_fighter
a thousand _words_
_brunch_ on a Sunday

<strong>Should not match</strong>

@ta_g_
__value__
#some_tag_value
@some_value_here
@some_tag_
#some_val_
#_hello_
`;

let re = /(?<=(?:\s|^)(?![@#])[^_\n]*)_([^_]+)_/g;
document.querySelector('div').innerHTML = text.replace(re, '<em>$1</em>');

div { white-space: pre; }

<div/>

这将捕获_something_为完全匹配，并something作为第一个捕获组（以删除下划线）。您不能只捕获something，因为这样您就无法分辨下划线内部和外部的内容（尝试使用(?<=(?:\s|^)(?![@#])[^_\n]*_)([^_]+)(?=_)）。

有两件事阻止它普遍适用：

并非所有 JavaScript 引擎都支持 Look-behinds
大多数正则表达式引擎不支持可变长度的look-behinds

编辑：这有点强，应该允许你另外match_this_and_that_ but not @match_this_and_that正确：

/(?<=(?:\s|^)(?![@#])(?!__)\S*)_([^_]+)_/

解释：

_([^_]+)_    Match non-underscory bit between two underscores
(?<=...)     that is preceded by
(?:\s|^)     either a whitespace or a start of a line/string
             (i.e. a proper word boundary, since we can't use `\b`)
\S*          and then some non-space characters
(?![@#])     that don't start with `@`, `#`,
(?!__)       or `__`.

正则表达式101演示

score 2 · Accepted Answer

您可以使用以下模式：

(?:^|\s)[^@#\s_]*(_([^_]+)_)

查看正则表达式演示

细节

(?:^|\s)- 字符串或空格的开头
[^@#\s_]*@- 除, #,_和空格之外的0 个或更多字符
(_([^_]+)_)- 第 1 组：_, 1+ 以外的字符_（捕获到第 2 组），然后_.

score 0 · Accepted Answer

这是一些东西，它不像其他答案那么紧凑，但我认为更容易理解发生了什么。匹配组\3是你想要的。

需要多行标志

^([a-zA-Z\s]+|_)(([a-zA-Z\s]+)_)+?[a-zA-Z\s]*?$

^- 匹配行首
([a-zA-Z\s]+|_)- 多个单词或_
(([a-zA-Z\s]+)_)+?- 多个单词后跟_至少一次，但最少匹配。
[a-zA-Z\s]*?- 任何最后的话
$- 行尾

总结的事情的细分，以匹配其中之一

_<words>_
<words>_<words>_
<words>_<words>_<words>
_<words>_<words>

javascript - 正则表达式：匹配下划线包裹的单词，除非它们以 @ / # 开头

3 回答 3

Related

Reference