regex - notepad++ 中的正则表达式来修剪数据/重复项

Question

假设一个文本文件有大约 40k 行

Color LaserJet 8500, Color Laserjet 8550, Color Laserjet 8500N, Color Laserjet 8500DN, Color Laserjet 8500GN, Color Laserjet 8550N, Color Laserjet 8550DN, Color Laserjet 8550GN, Color Laserjet 8550 MFP,

举个例子

any1 能够帮助我使用一个可以在数字之后但在逗号之前修剪掉所有数据的正则表达式？这样8500N就变成了8500

最终结果将是

Color Laserjet 8500, Color Laserjet 8550, Color Laserjet 8500, Color Laserjet 8500, Color Laserjet 8500, Color Laserjet 8550, Color Laserjet 8550, Color Laserjet 8550, Color Laserjet 8550,

任何人都可以以某种方式建议在记事本++（或其他易于使用的程序）中删除重复项的最佳方法

score 2 · Accepted Answer

You should replace each match of (?<=\d)[^\d,]+(?=,) with empty string.

The above regex reads: "Any one or more non-digit and non-comma character(s) between digit and comma".

In case you may experience such number with trailing letter(s) at then end of string (or line) and you want that trim as well, even there is no comma behind, then use (?<=\d)[^\d,]+(?:(?=,)|$)

That reads similar, it just adds "or end of string" behind the first meaning.

Update:

Because it seems that Notepad++ does not support regex lookaround, then the solution is to replace (\d)([^\d,]+)(,) with \1\3 or (\d)[^\d,]+(,) with \1\2.

score 0 · Accepted Answer

这个怎么样：

(.*?\d+)\D*(,)

它将匹配整个事物，但您可以抓住第 1 组和第 2 组。这将忽略数字和逗号之间的非数字。

替换将是：

\1\2

这是一个 SO，详细说明这是执行此操作的唯一方法。

或者，正如 Arithmomaniac 建议的那样，您可以使用一组来执行此操作，在每场比赛后添加逗号

(.*?\d+)\D*,

替换将是

\1,

score 0 · Accepted Answer

0

notepad++中正则表达式的截图... 记事本++截图

于 2012-06-27T17:26:26.183 回答

regex - notepad++ 中的正则表达式来修剪数据/重复项

3 回答 3

Related

Reference