python - 从列表中创建数据框并保留重复项

Question

我有一个数据框列表。列表中的每个数据框都是唯一的 - 这意味着有一些共享但不同的列。我想创建一个包含数据框列表中所有列的单个数据框，如果元素不存在，将填充 NaN。我试过以下

import pandas as pd
df_new = pd.concat(list_of_dfs)
#I get the following: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

问题似乎是由于列表中的数据框。每个数据框只有一行，因此它的索引为零，因此重新索引不会起作用。我试过这个：

 list_of_dfs.append(pd.DataFrame([rows], columns = tags).set_index(np.array(random.randint(0,5000))))

几乎生成一个随机数作为索引。但是，O 收到此错误：

ValueError: The parameter "keys" may be a column key, one-dimensional array, or a list containing only valid column keys and one-dimensional arrays.

score 0 · Accepted Answer

您需要在 pd.concat 中使用一些参数：

import pandas as pd

df1 = pd.DataFrame({'a':[1,2,3],'x':[4,5,6],'y':[7,8,9]})
df2 = pd.DataFrame({'b':[10,11,12],'x':[13,14,15],'y':[16,17,18]})

print(pd.concat([df1,df2], axis=0, ignore_index=True))

结果：

     a   x   y     b
0  1.0   4   7   NaN
1  2.0   5   8   NaN
2  3.0   6   9   NaN
3  NaN  13  16  10.0
4  NaN  14  17  11.0
5  NaN  15  18  12.0

所以，像这样使用 concat ：

pd.concat(list_of_dfs, axis=0, ignore_index=True)

score 0 · Accepted Answer

试试这个怎么样：

如果您的指标已经是独一无二的，这不应该伤害他们：

df = df.loc[~df.index.duplicated(keep='first')]

而是确保它们是独一无二的。您可以使用axisset 来index确保将索引用作连接的基础：

df_new = pd.concat(list_of_dfs, axis='index')

python - 从列表中创建数据框并保留重复项

2 回答 2

Related

Reference