1

我正在尝试编写一个仅匹配 HTML 中的 NASM 样式注释的 javascript 正则表达式。例如,"; interrupt"匹配"INT 21h ; interrupt".

你可能知道/;.*/这不可能是答案,因为评论之前可以有一个 HTML 实体;我认为/(?:[^&]|&.+;)*(;.*)$/应该为它工作,但我发现它有两个问题:

  1. "      ; hello world".match(/(?:[^&]|&.+;)*(;.*)$/)是一个数组["      ; hello world", "; hello world"]。我不想要一个数组。
  2. "      ; hello world; a message".match(/(?:[^&]|&.+;)*(;.*)$/)["      ; hello world; a message", "; a message"];更糟糕的是第二个元素。

问题:

  1. 为什么(?:)返回块?
  2. 为什么"; a message",不是"; hello world; a message"
  3. 我可以使用什么正确的正则表达式?
4

2 回答 2

1

1) The (?:) is not being returned. What you are seeing is that the .match() method will always return an array: The first element is the whole match, and the following elements (if any) are the back-references. In this case, you have one back-reference, so the array contains two items.

2) Because of the first half of your regex:

(?:[^&]|&.+;)*

This is not a good idea! This will match just about anything, even including new lines! In fact, the only thing it won't match is a "&" that is not followed by a ";" on the same line. Thus, it is matching everything up to the last ";" in each of your lines.

3) I'm not at all familiar with MASM-style comments in HTML, so I'd need to see a more extensive list of what you want matched/not matched in order to confidently give a good answer here.

But here's something I've thrown together very quickly, to at least solve the two examples you gave above:

.*&.*?;\s(;.*)$
于 2013-07-03T08:57:21.340 回答
0

广告 1.)?:块不返回。而是在第一个数组元素中返回完整匹配。此行为遵循非全局匹配规范(即没有g选项)。

广告 2.) 正则表达式 ( ) 的第一部分(?:[^&]|&.+;)*匹配太多。实际上,如果您删除第二部分,它将与整行匹配。用简单的英语,您要求匹配一系列&后跟尽可能多的字符,后跟 a;或除 之外的任何符号&,并且您要求引擎尽可能频繁地重复此匹配,直到;测试字符串中的最后一个 (如果有的话)。

广告 3.) 尝试

(?:[^&;]*(&[a-zA-Z0-9_-]+;[^&;]*)*)(;.*)$

它修复了损坏的实体匹配并返回最长的;-initial 后缀。

使用pagecolumn 正则表达式测试器进行测试(我不隶属于该网站)。

于 2013-07-03T09:17:14.763 回答