我从第三方收到了一个文本文件,其中包含我需要删除的 ID。该文件还包含许多我不需要的其他数据,并且不是定界或固定宽度格式。所以我想知道是否有一种方法可以使用 Notepad++ 和正则表达式来删除我的 ID 号以外的所有内容?身份证号码的格式为 8 位数字,必须以 0 开头。
示例:00000213、00023234、02456343
The numbers you want to keep, can be matched like this:
(?<!\d)0\d{7}(?!\d)
Where the lookarounds make sure that you get exactly 8 digits and not more.
Now you can simply match all other characters, until you reach one of these numbers, and delete everything else. In addition, you need to consider the case that you are removing the characters after the last of those numbers, up to the end of the string:
.*?((?<!\d)0\d{7}(?!\d)|\Z)
And replace with $1\t
to write back the number that you don't want to delete and a tab after it, so that you can still distinguish them after everything else has been removed (thanks to Sniffer for the latter suggestion). The ?
at the beginning is important so that you match as little as possible (and don't pass the first number if there is another one coming later). Make sure to activate the dot matches newline
option. And also make sure to update Notepad++ to version 6.