regex - 如何删除 emeditor 中的一些字符串（正则表达式）

Question

我想从带有 emeditor 的文件中删除一些字符串并保存我需要的字符串的其他部分..

文件行如：

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

我想在每一行中保存 id、first_name、last_name、phone、username（如果存在）=>

id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875,

我怎么能这样做？

谢谢

score 1 · Accepted Answer

JSON 解析是执行此操作的最佳方式 ( https://linuxconfig.org/how-to-parse-data-from-json-into-python )。但是你可以让生活变得更艰难并使用正则表达式（这里以 PCRE (PHP) 风格呈现）：

获取所有ID：

(?<=id\":\s\")(\w+)(?=\")

参见示例： https ://regex101.com/r/g5vfEd/1

获取所有名字：

(?<=first_name\\\":\\\")(\w)+(?=\\)

参见示例： https ://regex101.com/r/g5vfEd/2

获取所有姓氏：

(?<=last_name\\\":\\\")(\w)+(?=\\)

参见示例： https ://regex101.com/r/g5vfEd/3

获取所有电话号码：

(?<=phone\\\":\\\")(\w)+(?=\\)

参见示例： https ://regex101.com/r/g5vfEd/4

获取所有用户名（如果存在）：

(?<=username\\\":\\\")(\w)+(?=\\)

参见示例： https ://regex101.com/r/g5vfEd/5

匹配所有内容的完整模式：

id\\?\":\s?\"?(\w+),?[\\\"].*first_name\\\":\\"(\w+).*last_name\\\":\\\"(\w+).*phone\":\"(\d+).*(?=username)?\":\"(\w+).*

返回 3 个匹配项，每个匹配项包含以下 5 个组（此处显示匹配 1）：

Group 1.    85-94   702212125
Group 2.    145-150 david
Group 3.    169-174 jones
Group 4.    285-297 051863329875
Group 5.    454-462 contacts

见链接：https ://regex101.com/r/g5vfEd/6

score 0 · Accepted Answer

当您标记 regex 和 Emeditor 时，您可以试试这个。

Emeditor 19.1 及以上版本支持正则表达式命名组，如下所示：

(?<id>expression)

并使用以下形式命名反向引用：

\k<id>

所以步骤：

查找和替换 (Ctrl-H)。勾选“匹配大小写”并选择“正则表达式”。

寻找：

\\"id\\"[\\":]*(?<id>[^\\":,]*).*?\\"first_name\\"[\\":]*(?<first_name>[^\\":,]*).*?\\"last_name\\"[\\":]*(?<last_name>[^\\":,]*).*?\\"phone\\"[\\":]*(?<phone>[^\\":,]*)(.*?"username"[\\":]*(?<username>[^\\":,]*))?

用。。。来代替：

id:\k<id>\tfirst_name:\k<first_name>\tlast_name:\k<last_name>\tphone:\k<phone>\tusername:\k<username>

单击“提取”按钮旁边的向下箭头并选择“到新文档”单击“提取”按钮以输出到新的制表符分隔文件。

regex - 如何删除 emeditor 中的一些字符串（正则表达式）

2 回答 2

Related

Reference