python - 在python中将多列堆叠成一列

Question

我有一个 100 行 x 7 列的熊猫数据框，如下所示：

列source中的值连接到其他列中的值。例如，a连接到contact_1, contact_2... contact_5. 同理，b连接到contact_6, contact_7 .... and contact_10。

我只想将这些列堆叠成两列（即源和目标），以帮助我使用边缘列表格式构建图形。

预期的输出数据格式为：

我试过df.stack()但没有得到想要的结果，我得到了以下结果：

有什么建议么？

score 4 · Accepted Answer

你正在寻找pd.wide_to_long. 这应该这样做：

pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

该列destination_将包含您要查找的信息。

例子：

import pandas as pd
d = {'source': ['a', 'b'],
 'destination_1': ['contact_1', 'contact_6'],
 'destination_2': ['contact_2', 'contact_7']}
df = pd.DataFrame(d)
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

输出：

              destination_
source number             
a      1         contact_1
b      1         contact_6
a      2         contact_2
b      2         contact_7

score 4 · Accepted Answer

您可以尝试使用pandas.DataFrame.melt，它重新排列数据框，以便一列现在是标识符变量，其余的值变量。你可以在这里阅读更多关于它的信息。

您可以DataFrame.melt按如下方式申请您的数据：

df = pd.DataFrame(data={
    "source": ["a", "b", "c"],
    "destination_1": ["contact_1", "contact_6", "contact_11"],
    "destination_2": ["contact_2", "contact_7", "contact_12"],
    ...
})

output_df = df.melt(id_vars=["source"])
# value_vars automatically inferred to be the remaining columns.

这将输出一个看起来像的 DataFrame 对象

   source       variable       value
0       a  destination_1   contact_1
1       b  destination_1   contact_6
2       c  destination_1  contact_11
3       a  destination_2   contact_2
4       b  destination_2   contact_7
5       c  destination_2  contact_12
.       .              .           .
.       .              .           .
.       .              .           .

source您可以使用来按列排序output_df.sort_values(by=["source"])。如果需要，您可以删除该variable列并将该value列重命名为destination. 您还可以在使用排序后重置索引output_df.reset_index(drop=True)。

python - 在python中将多列堆叠成一列

2 回答 2

Related

Reference