javascript - 正则表达式以逗号分隔参数列表，但忽略引号中的逗号

Question

我需要解析一个以逗号分隔的形式参数列表的字符串

key1=value1,key2=value2,key3=value3...

复杂之处在于这些值可以用引号引起来，以允许它们包含空格和逗号等。当然，引号括起来的逗号不应算作分隔参数。（引号外的各个地方也可以有空格，可能应该忽略。）

我的想法是在逗号处拆分列表，然后在每个参数定义中，将键与等号处的值分开。所以对于split参数，我需要找到有效的（不在引号中）；我认为正则表达式是简洁和直接的方式。

以下是一些示例字符串：

Include="All Violations", CheckType=MaxTrans
MetricName = PlacedInstances, PlacedOnly = 1
CheckType=Hold, Include="reg2reg,in2reg,in2out,reg2out"
CheckType=Setup, Include="reg2reg,in2reg,in2out,reg2out（原文如此）

是的，最后一个格式不正确：缺少值中的终止引号。

我发现这个答案很有帮助（正则表达式：）/,(?=(?:(?:[^"]*"){2})*[^"]*$)/，除了解析格式不正确的答案。在我的情况下，我在等号中有额外的信息，这将允许解析那个。

我试过这个：(/(?<==[^"]+),/，它适用于格式不正确的，但我的第一个例子失败了。我认为我需要的是一种方法来查找逗号前面有一个等号，但逗号和前面的第一个等号之间有零个或两个引号（不仅仅是一个单引号）。但是我如何在 Javascript Regex 中编写它？

score 1 · Accepted Answer

可以使用一种基于例如两个正则表达式的方法...

第一个必须split根据 OP 的要求提供字符串；因此它基于积极的前瞻性。

第二个将在执行结果参数模板项map数组的操作中使用。每个项目都将由正则表达式处理，该正则表达式试图捕获命名组。此外，a 字段的字符串值将被处理。groupvaluetrim

// see ... [https://regex101.com/r/nUc8en/1/]
const regXParameterSplit = (/,\s*(?=[^=,]+=)/);

// see ... [https://regex101.com/r/7xSwyX/1/]
const regXCaptureKeyValue = (/^(?<key>[^=\s]+)\s*="*(?<value>[^"]+)/);

const testSample = 'Include="All Violations", CheckType=MaxTrans, MetricName = PlacedInstances, PlacedOnly = 1, CheckType=Hold, Include="reg2reg,in2reg,in2out,reg2out", CheckType=Setup, Include="reg2reg,in2reg,in2out,reg2out,CheckType=Setup';

function getKeyAndValue(template) {
  const { groups } = (regXCaptureKeyValue.exec(template) || {});
  if (groups) {
    groups.value = groups.value.trim();
  }
  return groups;
}

console.log(
  '... just splitting ...',
  testSample
    .split(regXParameterSplit)
);
console.log(
  '... the full approach ...',
  testSample
    .split(regXParameterSplit)
    .map(getKeyAndValue)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

score 0 · Accepted Answer

利用

string.match(/\w+\s*=\s*(?:"[^"\n]*(?:"|$)|\S+(?=,|$))/g)

见证明。

解释

--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  =                        '='
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
    [^"\n]*                  any character except: '"', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      "                        '"'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      $                        before an optional \n, and the end of
                               the string
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      ,                        ','
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      $                        before an optional \n, and the end of
                               the string
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
  )                        end of grouping

score 0 · Accepted Answer

像这样的东西会起作用：

/(?:^|, *)(?<key>[a-z]+) *= *(?<value>[^\r\n,"]+|"[^\r\n"]+"?)/gmi

https://regex101.com/r/z05WcM/1

(?:^|, *)(?<key>[a-z]+)命名一个捕获组“key”，它被定义为一系列 alpha 字符，它们要么位于行首，要么位于逗号和可选空格之后
*= *- 赋值运算符（等号）两边可以有空格
(?<value>[^\r\n,"]+|"[^\r\n"]+"?)- 将捕获组命名为“值”，它可以是非逗号和非引号包含的字符串，或者如果它以引号开头，那么它可以有逗号和可选的结束引号

但如果你有这样的数据，Include="All Viola\"tions"它就会失败。

请注意，我避免使用lookbehinds，因为并非所有浏览器都普遍支持它们。

javascript - 正则表达式以逗号分隔参数列表，但忽略引号中的逗号

3 回答 3

Related

Reference