0

这是一个 XML 文件,其中包含我想使用lxml.objectifypandas.DataFrame
文件执行任务的数据:students.xml

<?xml version="1.0" encoding="UTF-8"?>

<college>
    <department>
    <name>Information Technology</name>
        <semester>
            <sem_3>
                <student_no>1</student_no>
                <student_name>Ravindra</student_name>
                <student_city>Ahmedabad</student_city>
            </sem_3>
        </semester>
    </department>
    <department>
    <name>Computer Engineering</name>
        <semester>
            <sem_3>
                <student_no>2</student_no>
                <student_name>Surya</student_name>
                <student_city>Gandhinagar</student_city>
            </sem_3>
        </semester>
    </department>
</college>

我试过了,只能得到这个输出。

import pandas as pd
from lxml import objectify
from pandas import DataFrame
xml = objectify.parse(open('students.xml'))
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
  obj = root.getchildren()[i].getchildren()
  for j in range(0, 1):
    child_obj = obj[1].getchildren()[j].getchildren()
    number.append(child_obj[0])
    name.append(child_obj[1])
    city.append(child_obj[2])
df = pd.DataFrame(list(zip(number, name, city)), columns =['student_no', 'student_name', 'student_city'])
print(df)
-----------------------------------------------
  student_no    student_name       student_city
0    [[[1]]]  [[[Ravindra]]]    [[[Ahmedabad]]]
1    [[[2]]]     [[[Surya]]]  [[[Gandhinagar]]]
-----------------------------------------------

我无法获得这样的输出......

-----------------------------------------------
  student_no    student_name       student_city
0          1        Ravindra          Ahmedabad
1          2           Surya        Gandhinagar
-----------------------------------------------

你能帮我解决这个问题吗?

4

1 回答 1

2

您正在将 lxml 对象附加到您的列表中

import pandas as pd
from lxml import objectify
from pandas import DataFrame
with open('students.xml') as f:
    xml = objectify.parse(f)
root = xml.getroot()
number = []
name = []
city = []
for i in range(0, 2):
    obj = root.getchildren()[i].getchildren()
    for j in range(0, 1):
        child_obj = obj[1].getchildren()[j].getchildren()
        number.append(int(child_obj[0].text))
        name.append(child_obj[1].text)
        city.append(child_obj[2].text)
data = {"student_no": number, 'student_name': name, 'student_city': city}         
df = pd.DataFrame(data)
print(df)

输出:

  student_no student_name student_city
0          1     Ravindra    Ahmedabad
1          2        Surya  Gandhinagar
于 2020-09-29T07:00:55.300 回答