2

我有以下结构,我从带有熊猫的 .txt 转换而来

 [[000001, 'PEPE                  ', 'S', 'LAST_NAME   ', 'CIP  ', 'CELLPHONE'],
 [0000002, 'LUIS  ', 'S', 'ADRESS  ', '                       ', 'nan'],
 [0000003, 'PEDRO               ', 'S', 'STREET ', 'CITY', ' nan']]

我的代码

import pandas as pd
file = 'C:\\Users\\Admin\\Desktop\\PRUEBA.txt'

columns = ("service", "name", "Active", "reference1", "reference2", "reference3")
df = pd.read_csv(file, sep="|", names=columns, header=None)
cl = df.values.tolist()
print(cl)

但是为了能够通过删除空字符串和 nan 对其进行处理,我如何将服务转换为 int 并以这种方式基于服务和引用创建对象。

[
  { service: 1, name: 'PEPE', order: 0, ref: 'LAST_NAME' },
  { service: 1, name: 'PEPE', order: 1, ref: 'CIP' },
  { service: 1, name: 'PEPE', order: 2, ref: 'CELLPHONE' },
  { service: 2, name: 'LUIS', order: 0, ref: 'ADRESS' },
  { service: 3, name: 'PEDRO', order: 0, ref: 'STREET' },
  { service: 3, name: 'PEDRO', order: 1, ref: 'CITY' }
]

我怎样才能做到这一点,非常感谢您的意见

4

1 回答 1

2

关键:使用df.melt() 取消透视表,然后执行df.to_dict(orient='records')dict将数据帧转换为@QuangHoang 提到的面向记录的数据帧。其余的是定期过滤和杂项调整。

# data
ls = [['000001', 'PEPE                  ', 'S', 'LAST_NAME   ', 'CIP  ', 'CELLPHONE'],
      ['0000002', 'LUIS  ', 'S', 'ADRESS  ', '                       ', 'nan'],
      ['0000003', 'PEDRO               ', 'S', 'STREET ', 'CITY', ' nan']
      ]
df = pd.DataFrame(ls, columns=("service", "name", "Active", "reference1", "reference2", "reference3"))

# reformat and strip over each column
for col in df:
    if col == "service":
        df[col] = df[col].astype(int)
    else:
        df[col] = df[col].str.strip()  # accessor

# unpivot and adjust
df2 = df.melt(id_vars=["service", "name"],
              value_vars=["reference1", "reference2", "reference3"],
              value_name="ref")\
    .sort_values(by="service")\
    .drop("variable", axis=1)\
    .reset_index(drop=True)

# filter out empty or nan
df2 = df2[~df2["ref"].isin(["", "nan"])]

# generate order numbering by group
df2["order"] = df2.groupby("service").cumcount()
df2 = df2[["service", "name", "order", "ref"]]  # reorder

# convert to a record-oriented dict
df2.to_dict(orient='records')

Out[99]: 
[{'service': 1, 'name': 'PEPE', 'order': 0, 'ref': 'LAST_NAME'},
 {'service': 1, 'name': 'PEPE', 'order': 1, 'ref': 'CIP'},
 {'service': 1, 'name': 'PEPE', 'order': 2, 'ref': 'CELLPHONE'},
 {'service': 2, 'name': 'LUIS', 'order': 0, 'ref': 'ADRESS'},
 {'service': 3, 'name': 'PEDRO', 'order': 0, 'ref': 'STREET'},
 {'service': 3, 'name': 'PEDRO', 'order': 1, 'ref': 'CITY'}]
于 2020-10-21T21:44:11.413 回答