regex - 如何通过正则表达式作为循环变量循环文件内的一行

Question

我正在尝试为 json 文件制作类似 Explode 函数的东西。循环应该逐行获取一个 json 文件，并且在每一行中我有多个值，我想从该行中提取并将其与主行放在一起（如 SQL 中的横向视图或 Explode 函数）

数据看起来像这样

{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717,"wl_key3":589101,"wl_key4":23095,"wl_key5":200527,"wl_key6":60319}

现在我想要的是 SQL 爆炸这个

{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key3":589101}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key4":23095}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key5":200527}


 import io
 import sys
 import re

 i = 0
 with io.open('lateral_result.json', 'w', encoding="utf-8") as f, io.open('lat.json', encoding="utf-8") as g:
for line in g:
    x = re.search('(.*wl_timestamp":"[^"]+",)', line)
    y = re.search('("wl_key[^,]+),', line)
    for y in line:
        i = i + 1
        print (x.group(0), y.group(i),'}', file=f)

我总是得到一个错误，我无法将 str 作为组，但是当我将正则表达式放在下一个 for 循环中时，它只会让我得到第一个结果并且什么都不做，或者以另一种方式它只需要相同的结果并写入它经常在行中找到一个字符。

score 2 · Accepted Answer

不要在 json 上使用正则表达式 - 在 json 上使用json并操作数据结构：

import json

data_str = """{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717,"wl_key3":589101,"wl_key4":23095,"wl_key5":200527,"wl_key6":60319}"""

data = json.loads(data_str)  # you can use json.load( file_handle )

print(data)

for k in (x for x in data.keys() if x.startswith("wl_key")):
    print(data["wl_timestamp"],k,data[k])

输出：

2013-01-27 16:07:02 wl_key2 103717
2013-01-27 16:07:02 wl_key3 589101
2013-01-27 16:07:02 wl_key4 23095
2013-01-27 16:07:02 wl_key5 200527
2013-01-27 16:07:02 wl_key6 60319

score 0 · Accepted Answer

这是解决我的案例的代码

import json
import io
import sys
import re

with io.open('lateral_result.json', 'w', encoding="utf-8") as f, io.open('lat.json', encoding="utf-8") as g:
    for line in g:
        l = str(line)
        data = json.loads(l)  
        for k in (x for x in data.keys() if x.startswith("wl_key")):
             x = re.search('(.*wl_timestamp":"[^"]+",")', line)
             print(x.group(0)+str(k)+'":'+str(data[k])+'}', file=f)

regex - 如何通过正则表达式作为循环变量循环文件内的一行

2 回答 2

Related

Reference