我有一个数据框的列值,我在其中接收到一个字符串输入,如下所示,其中 startIndex 是每个字符开头的索引,结束索引是字符串中该字符出现的结尾,标志是字符本身。
+---+------------------+
| id| Values |
+---+------------------+
|01 | AABBBAA |
|02 | SSSAAAA |
+---+------------------+
现在我想将字符串转换为每一行的字典,如下所示:
+---+--------------------+
| id| Values |
+---+--------------------+
|01 | [{"startIndex":0, |
| | "endIndex" : 1, |
| | "flag" : A }, |
| | {"startIndex":2, |
| | "endIndex" : 4, |
| | "flag" : B }, |
| | {"startIndex":5, |
| | "endIndex" : 6, |
| | "flag" : A }] |
|02 | [{"startIndex":0, |
| | "endIndex" : 2, |
| | "flag" : S }, |
| | {"startIndex":3, |
| | "endIndex" : 6, |
| | "flag" : A }] |
+---+--------------------+-
我有伪代码来构建字典,但不确定如何在不使用循环的情况下一次将其应用于所有行。此外,这种方法的问题是只有最后一个框架字典在所有行中被覆盖
import re
x = "aaabbbbccaa"
xs = re.findall(r"((.)\2*)", x)
print(xs)
start = 0
output = ''
for item in xs:
end = start + (len(item[0])-1)
startIndex = start
endIndex = end
qualityFlag = item[1]
print(startIndex, endIndex, qualityFlag)
start = end+