3

这个问题是对前面一个问题的补充。如果您需要更多背景知识,可以在此处查看原始问题:

使用从 lxml xpath 命令获得的数据填充 Python 列表

我已将@ihor-kaharlichenko 的出色建议(来自我的原始问题)合并到修改后的代码中,在这里:

from lxml import etree as ET
from datetime import datetime

xmlDoc = ET.parse('http://192.168.1.198/Bench_read_scalar.xml')

response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)

dmtVal = []

for dmt in response.iter('dmt'):
    val = [str(dmt.xpath('./%s/text()' % tag)) for tag in tags]
    val.insert(0, str(datetime.now())) #Add timestamp at beginning of each record
    dmtVal.append(val)

for item in dmtVal:
    str(item).strip('[')
    str(item).strip(']')
    str(item).strip('"')

最后一个街区是我遇到问题的地方。我得到的数据dmtVal看起来像:

[['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]

但是,我真的希望数据看起来像这样:

[['2012-08-16 12:38:45.152222', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 12:38:45.152519', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]

我认为这是一个相当简单的字符串剥离工作,我在原始迭代中尝试了代码dmtVal最初填充的地方),但这不起作用,所以我在循环之外进行了剥离操作,如上所示,它是还是行不通。我在想我正在犯某种菜鸟错误,但找不到。欢迎大家提出意见!


感谢大家的及时和有用的回复。这是更正后的代码:

from lxml import etree as ET
from datetime import datetime

xmlDoc = ET.parse('http://192.168.1.198/Bench_read_scalar.xml')

print '...Starting to parse XML nodes'

response = xmlDoc.getroot()

tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)

dmtVal = []

for dmt in response.iter('dmt'):
    val = [' '.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]
    val.insert(0, str(datetime.now())) #Add timestamp at beginning of each record
    dmtVal.append(val)

产生:

...Starting to parse XML nodes
[['2012-08-16 14:41:10.442776', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 14:41:10.443052', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]
...Done

谢谢大家!

4

5 回答 5

2

鉴于您当前的数据为grps

解决方案 1 - ast.literal_eval

import ast
grps = [['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]
desired_output = [[grp[0]] + [ast.literal_eval(item)[0] for item in grp[1:]] for grp in grps]

print desired_output

输出

[['2012-08-16 12:38:45.152222', '0x46', '0x32', '1.234', '5.678', '9.123', '4.567', '0x98', '0x97'], ['2012-08-16 12:38:45.152519', '0x47', '0x33', '8.901', '2.345', '6.789', '0.123', '0x96', '0x95']]

解释

ast.literal_eval是一种安全的方法eval。它仅适用于评估数据类型(字符串、数字、元组、列表、字典、布尔值和无)。在您的情况下,它会将 "['1.0']" 评估为长度为 1 的列表,例如['1.0']. 您可能想看一下,并确保您了解列表推导

另一种写法是:

desired_output = []
for grp in grps:  # loop through each group
    new_grp = grp[0]  # assign the first element (an array) to be our new_grp
    for item in grp[1:]  # loop over every item from index 1 to the end
        evaluated_item = ast.literal_eval(item)  # get the evaluated data
        new_grp.append(evaluated_item[0])  # append the item in the 1 item list to the new_grp
    desired_output.append(new_grp)  # append the new_grp to the desired_output list

解决方案 2 - 正则表达式

import re
stripper = re.compile("[\[\]']")
grps = [['2012-08-16 12:38:45.152222', "['0x46']", "['0x32']", "['1.234']", "['5.678']", "['9.123']", "['4.567']", "['0x98']", "['0x97']"], ['2012-08-16 12:38:45.152519', "['0x47']", "['0x33']", "['8.901']", "['2.345']", "['6.789']", "['0.123']", "['0x96']", "['0x95']"]]
desired_output = [[grp[0]] + [ stripper.sub('', item) for item in grp[1:]] for grp in grps]

您的解决方案的问题在于,在 for 循环中迭代的项目不是通过引用传递的,因此更改它们不会影响原始数据。

解决方案 3 - 修复您的原始代码

要修复您的解决方案,您将执行以下操作:

for i, grp in enumerate(dmtVal):  # loop over the inner lists
    for j, item in enumerate(grp):
        dmtVal[i][j] = item.strip('\]')
        dmtVal[i][j] = dmtVal[i][j].lstrip('\[')
        dmtVal[i][j] = dmtVal[i][j].strip("'")

与其在每次剥离时分配 balue balue ,不如dmtVal[i][j]使用取消引用的 value item,对其进行操作,然后在最后分配回dmtVal[i][j]

for i, grp in enumerate(dmtVal):  # loop over the inner lists
    for j, item in enumerate(grp):
        # Could intead be
        item = item.strip('\]')
        item = dmtVal[i][j].lstrip('\[')
        item = dmtVal[i][j].strip("'")
        dmtVal[i][j] = item

或者更好的解决方案(恕我直言):

for i, grp in enumerate(dmtVal):  # loop over the inner lists
    for j, item in enumerate(grp):
        dmtVal[i][j] = item.replace('[', '').replace(']', '').replace("'", '')
于 2012-08-16T19:41:33.610 回答
1

这会做你需要它做的事情,但也许不是最好的方法:

new_dmt_val = []
for sublist in dmtVal:
    new_dmt_val.append([elem.strip('[\'').strip('\']') for elem in sublist])

试图使其可读,它可能在更少但更令人困惑的行中是可行的。

于 2012-08-16T19:59:41.607 回答
1

答案是:首先不要创建字符串。


您的问题出在这部分代码中:

for dmt in response.iter('dmt'):
    val = [str(dmt.xpath('./%s/text()' % tag)) for tag in tags]

我猜你str()在这里尝试从xpath()返回的列表中提取字符串。
然而,这不是你得到的。str()只是给你一个列表的字符串表示。

你有几个选择去做你想做的事。
但是鉴于您正在解析 html,因此无法确定列表将包含多少个元素,您最好的选择可能是使用''.join()

for dmt in response.iter('dmt'):
    val = [''.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]



编辑:如果您使用此代码,您将不需要最后一个循环。

于 2012-08-16T20:33:16.890 回答
1

string.strip只去除前导和尾随字符。您可能想string.replace改用。另外,请注意,string.strip(and string.replace) 返回字符串的副本

或者干脆使用''.join()代替str并完全放弃整个剥离业务:

val = [''.join(dmt.xpath('./%s/text()' % tag)) for tag in tags]

作为旁注,您可能也想使用datetime.isoformat而不是str

val.insert(0, datetime.now().isoformat()) #Add timestamp at beginning of each record

查看help(datetime)更多选项

于 2012-08-16T20:59:30.120 回答
1

您原始帖子的字符串在哪里xml...(我认为这在某种程度上涵盖了两者...)

from lxml import etree
from datetime import datetime
from ast import literal_eval

tree = etree.fromstring(xml).getroottree()
dmts = []
for dmt in tree.iterfind('dmt'):
    to_add = {'datetime': datetime.now()}
    to_add.update( {n.tag:literal_eval(n.text) for n in dmt} )
    dmts.append(to_add)

您仍然可以稍后显式订购节点 - 尽管我发现这种方法更清晰,因为您可以使用名称而不是索引(这完全取决于引入或删除节点是否应该是错误)

于 2012-08-16T21:02:12.840 回答