0

我有 4 个用于 4 份报纸(newspaper1、newspaper2、newspaper3、newspaper4])的数据框,其中有一个作者姓名列。

现在我想将这 4 个数据框合并为一个,它有 5 列:作者和报纸 1、报纸 2、报纸 3、报纸 4,其中包含 1/0 值(1 代表为该报纸撰写的作者)

import pandas as pd 

listOfMedia =[newspaper1,newspaper2,newspaper3,newspaper4]
merged = pd.DataFrame(columns=['author','newspaper1','newspaper2', 'newspaper4', 'newspaper4'])

而这个循环做我想要的(用名称填充合并的df作者列):

for item in listOfMedia:
    merged.author = item.author

我不知道如何用 1/0 值填充报纸栏...

for item in listOfMedia:
    if item == newspaper1:
        merged['newspaper1'] = '1'
    elif item == newspaper2:
        merged['newspaper2'] = '1'
    elif item == newspaper3:
        merged['newspaper3'] = '1'
    else:
        merged['newspaper4'] = '1'

我不断收到错误

在处理上述异常的过程中,发生了另一个异常: TypeError: attrib() got an unexpected keyword argument 'convert' 试图用谷歌搜索该错误,但没有帮助我确定问题所在。我在这里想念什么?我还认为必须有更聪明的方法来填充报纸/作者矩阵,但似乎连这种简单的方法都无法弄清楚。我正在使用 jupyter 笔记本。

4

2 回答 2

0

我猜了我认为您的数据框是什么样的。

newspaper1 = pd.DataFrame({'author': ['author1', 'author2', 'author3']})
newspaper2 = pd.DataFrame({'author': ['author1', 'author2', 'author4']})
newspaper3 = pd.DataFrame({'author': ['author1', 'author2', 'author5']})
newspaper4 = pd.DataFrame({'author': ['author1', 'author2', 'author6']})

首先,我们将复制数据框,以免影响原始数据:

newspaper1_temp = newspaper1.copy()
newspaper2_temp = newspaper2.copy()
newspaper3_temp = newspaper3.copy()
newspaper4_temp = newspaper4.copy()

接下来我们将每个数据帧的索引替换为作者姓名:

newspaper1_temp.index = newspaper1['author']
newspaper2_temp.index = newspaper2['author']
newspaper3_temp.index = newspaper3['author']
newspaper4_temp.index = newspaper4['author']

然后我们连接这些数据帧(通过我们设置的索引将它们匹配在一起):

merged = pd.concat([newspaper1_temp, newspaper2_temp, newspaper3_temp, newspaper4_temp], axis =1)
merged.columns = ['newspaper1', 'newspaper2', 'newspaper3', 'newspaper4']

最后,我们将 NaN 替换为 0,然后将非零条目(其中仍包含作者姓名)替换为 1:

merged = merged.fillna(0)
merged[merged != 0] = 1
于 2020-10-06T16:08:27.233 回答
0

实际上,您将所有行设置为 1,因此请使用:

for col in merged.columns:
    merged[col].values[:] = 1
于 2020-10-06T15:31:01.600 回答