3

我正在编写一个正则表达式,将*单词末尾的 s 替换为表示这些星号计数的上标数字,以及一行开头的星号,后跟一个空格,然后是一个单词。也就是说,我可以很容易地在手机上写脚注。编写事物 → 将事物发送到 iOS 快捷方式 → 正则表达式魔法 → 事物具有脚注标记。

但是,由于我经常用它*foo bar*来表示强调,我不想捕捉那些星号。

我以为我有这个正则表达式:

/**
 * (?<=\S)                  -- make sure the thing behind the capture is a not-space
 * (?<!\W\*\w([^*]|\w\*)*?) -- make sure the thing behind the capture is not a not-word character
 *                             followed by an asterisk
 *                             followed by anything that isn't an asterisk
 *                             followed by a letter followed by an asterisk
 *                             e.g. Hello *world*.
 * \*+                      -- 1+ asterisks.  The primary capture for trailing asterisks.
 * (?=[^\w*]|$)             -- make sure the thing following the capture is a not-word-not-asterisk,
 *                             and may be the end of the line
 * |                        -- OR
 * ^\*+(?=\s\S)             -- the start of a line followed by 1+ asterisks (the primary capture)
 *                             followed by a space
 *                             followed by a not-space
 */
const regex = /(?<=\S)(?<!\W\*\w([^*]|\w\*)*?)\*+(?=[^\w*]|$)|^\*+(?=\s\S)/gm;

const transform = m => {
  const superTable = [
    '⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
  ];

  let str = [];

  // for each digit, add the character for the 1s place then divide by ten
  for (let len = m.length; len; len = (len - len % 10) / 10) {
    str.unshift(superTable[len % 10]);
  }

  return str.join('');
}

/** [input, expectedOutput] */
const testCases = [
  [`A b*** c`, `A b³ c`],
  [`A *b* c*`, `A *b* c¹`],
  [`A *b* *c* d*`, `A *b* *c* d¹`],
  [`A *b* c* d**`, `A *b* c¹ d²`],
  [`** a b c`, `² a b c`],
  [`** a b*** c`, `² a b³ c`],
  [`A *bc* d**`, `A *bc* d²`],
  [`A *b c* d**`, `A *b c* d²`],
];

const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));

for (const [input, expected] of testCases) {
  const actual = input.replace(regex, transform);
  const extraSpacing = actual.length < 8 ? '\t' : '';
  const success = actual === expected;
  results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}

console.log(results.join('\n'));

前六个是我第一次编写脚本时使用的测试用例。我今天发现的最后两个。事实证明,它适用于*a*(用星号包裹的单个字符)但不适用于*ab*or *a b*(用星号包裹的 2+ 个字符)。

尽管我承认我几周前写了这个正则表达式,但我终其一生都无法弄清楚我做错了什么。我怀疑这与贪婪或懒惰有关,但我不确定在哪里。

4

1 回答 1

1

您可以使用

/^\*+(?=\s+\S)|(?<!\s)(?<!\*(?=\S)[^*]*)(\*+)(?![\w*])/gm

请参阅正则表达式演示详情

  • ^- 一行的开始
  • \*+(?=\s+\S)- 一个或多个星号后跟一个或多个空格,然后是一个非空格字符
  • | - 或者
  • (?<!\s)- 紧靠左边,不能有空格字符(如果你使用单词字符\w,你可以\b在这里使用)
  • (?<!\*(?=\S)[^*]*)- 紧靠左边,后面不能*跟非空白字符,然后是星号以外的零个或多个字符
  • \*+- 一个或多个星号
  • (?![\w*])- 就在右边,不能有单词和*字符。

这是您更新的 JavaScript 演示:

const regex = /^\*+(?=\s+\S)|(?<!\s)(?<!\*(?=\S)[^*]*)(\*+)(?![\w*])/gm;

const transform = m => {
  const superTable = [
    '⁰', '¹', '²', '³', '⁴', '⁵', '⁶', '⁷', '⁸', '⁹'
  ];

  let str = [];

  // for each digit, add the character for the 1s place then divide by ten
  for (let len = m.length; len; len = (len - len % 10) / 10) {
    str.unshift(superTable[len % 10]);
  }

  return str.join('');
}

/** [input, expectedOutput] */
const testCases = [
  [`A b*** c`, `A b³ c`],
  [`A *b* c*`, `A *b* c¹`],
  [`A *b* *c* d*`, `A *b* *c* d¹`],
  [`A *b* c* d**`, `A *b* c¹ d²`],
  [`** a b c`, `² a b c`],
  [`** a b*** c`, `² a b³ c`],
  [`A *bc* d**`, `A *bc* d²`],
  [`A *b c* d*`, `A *b c* d¹`]
];

const results = ['Input\t\t=>\tActual\t\t===\tExpected\t: Success'];
results.push('='.repeat(73));

for (const [input, expected] of testCases) {
  const actual = input.replace(regex, transform);
  const extraSpacing = actual.length < 8 ? '\t' : '';
  const success = actual === expected;
  results.push(`${input}\t=>\t${actual}${extraSpacing}\t===\t${expected}${extraSpacing}\t: ${success}`);
}

console.log(results.join('\n'));

于 2021-10-12T20:55:45.990 回答