2

我想从带有 emeditor 的文件中删除一些字符串并保存我需要的字符串的其他部分..

文件行如:

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"username\":\"david_d192\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"username":"david_d192","type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

{"message":"{\"_\":\"user\",\"pFlags\":{\"contact\":true},\"user_flags\":2143,\"id\":702212125,\"access_hash\":\"914250561826\",\"first_name\":\"david\",\"last_name\":\"jones\",\"phone\":\"051863329875\",\"status\":{\"_\":\"userStatusRecently\"}}","phone":"051863329875","version":"3","type":"unknown","token":"1556189892619764206","p_id":702212125,"type":"redis","user_flags":2143,"host":"win",from":"contacts"}
{"index": {"_type": "_doc", "_id": "36GG54F"}}

我想在每一行中保存 id、first_name、last_name、phone、username(如果存在)=>

id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875 username:david_d192,
id:702212125 first_name:david last_name:jones phone:051863329875,

我怎么能这样做?

谢谢

4

2 回答 2

1

JSON 解析是执行此操作的最佳方式 ( https://linuxconfig.org/how-to-parse-data-from-json-into-python )。但是你可以让生活变得更艰难并使用正则表达式(这里以 PCRE (PHP) 风格呈现):

获取所有ID:

(?<=id\":\s\")(\w+)(?=\")

参见示例: https ://regex101.com/r/g5vfEd/1

获取所有名字:

(?<=first_name\\\":\\\")(\w)+(?=\\)

参见示例: https ://regex101.com/r/g5vfEd/2

获取所有姓氏:

(?<=last_name\\\":\\\")(\w)+(?=\\)

参见示例: https ://regex101.com/r/g5vfEd/3

获取所有电话号码:

(?<=phone\\\":\\\")(\w)+(?=\\)

参见示例: https ://regex101.com/r/g5vfEd/4

获取所有用户名(如果存在):

(?<=username\\\":\\\")(\w)+(?=\\)

参见示例: https ://regex101.com/r/g5vfEd/5

匹配所有内容的完整模式:

id\\?\":\s?\"?(\w+),?[\\\"].*first_name\\\":\\"(\w+).*last_name\\\":\\\"(\w+).*phone\":\"(\d+).*(?=username)?\":\"(\w+).*

返回 3 个匹配项,每个匹配项包含以下 5 个组(此处显示匹配 1):

Group 1.    85-94   702212125
Group 2.    145-150 david
Group 3.    169-174 jones
Group 4.    285-297 051863329875
Group 5.    454-462 contacts

见链接:https ://regex101.com/r/g5vfEd/6

于 2020-06-18T07:58:17.453 回答
0

当您标记 regex 和 Emeditor 时,您可以试试这个。

Emeditor 19.1 及以上版本支持正则表达式命名组,如下所示:

(?<id>expression) 

并使用以下形式命名反向引用:

\k<id>

所以步骤:

查找和替换 (Ctrl-H)。勾选“匹配大小写”并选择“正则表达式”。

寻找:

\\"id\\"[\\":]*(?<id>[^\\":,]*).*?\\"first_name\\"[\\":]*(?<first_name>[^\\":,]*).*?\\"last_name\\"[\\":]*(?<last_name>[^\\":,]*).*?\\"phone\\"[\\":]*(?<phone>[^\\":,]*)(.*?"username"[\\":]*(?<username>[^\\":,]*))?

用。。。来代替:

id:\k<id>\tfirst_name:\k<first_name>\tlast_name:\k<last_name>\tphone:\k<phone>\tusername:\k<username>

单击“提取”按钮旁边的向下箭头并选择“到新文档”单击“提取”按钮以输出到新的制表符分隔文件。

于 2020-06-18T12:59:35.800 回答