python - 通过在其他 DataFrame 中搜索索引和列名来填充矩阵

Question

我有一个“空”数据框，如下所示：

        6807    6809    5341
126293  nan     nan     nan
126294  nan     nan     nan     
126295  nan     nan     nan

列名给我一个name_id，而索引值给我一个file_id。现在我想在名为,的单独熊猫数据框中搜索file_id和，它们看起来像这样：name_idprocontneutral

    file_id name_id
0   126293  7244
1   126293  4978
2   126293  5112
3   126293  6864

如果我在数据框中找到file_id和我想用填充上面的空数据框单元格，当在中找到时，在中时，输入到矩阵中的值应该是。给我这样的结果，例如：name_idpro1cont-1neutral0

        6807    6809    5341
126293  1       -1     0
126294  0       -1     0        
126295  1       -1     1

有人知道如何完成这项工作吗？

score 2 · Accepted Answer

这是一种方法，使用、file_id和name_id中的和的交集作为索引来设置您想要的值 1、0 或 -1。我使用 Python类来执行交集。但是，它不能很好地索引到，因为它会导致.proneutralcont DataFramesetDataFrametuple

编辑：2022 年 1 月 29 日我错过了之前解决方案中的重要一步。需要使用 itertools 产品才能获得所有排列df.index和df.columns组合。请参阅下面的更新代码。

from itertools import product

pro_idx = set(product(df.index, df.columns)).intersection(zip(pro['file_id'], pro['name_id']))
neut_idx = set(product(df.index, df.columns)).intersection(zip(neutral['file_id'], neutral['name_id']))
cont_idx = set(product(df.index, df.columns)).intersection(zip(cont['file_id'], cont['name_id']))

if any(pro_idx):
    for f,n in pro_idx:
        df.loc[f,n] = 1
        
if any(neut_idx):
    for f,n in neut_idx:
        df.loc[f,n] = 0
        
if any(cont_idx):
    for f,n in cont_idx:
        df.loc[f,n] = -1

score 2 · Accepted Answer

您可以堆叠您的“空”df（我们称之为）并与,和.df的组合合并。然后你可以将它重新排列成二维形状proconneu

将投票放在一个数据框中：

votes = pd.concat([pro.assign(v=1), con.assign(v=-1), neu.assign(v=0)])
votes['name_id'] = votes['name_id'].astype(str) # you may or may not have to do this depending on what type your actual df is, as I have no way of knowing. It should match the type from columns in the empty df

votes现在看起来像这样（我编的数字）：

    file_id name_id v
0   126293  6807    1
1   126293  4978    1
2   126293  5112    1
3   126293  6864    1
0   126295  6809    -1
0   126294  5341    0

现在我们将它合并到dfname_id 和 file_id 上的堆叠：

df1  = (df.unstack()
            .reset_index()
            .merge(votes, left_on = ['level_0','level_1'], 
                right_on = [ 'name_id','file_id'], how='left')[['level_0', 'level_1', 'v']]
)

df1好像


    level_0 level_1 v
0   6807    126293  1.0
1   6807    126294  NaN
2   6807    126295  NaN
3   6809    126293  NaN
4   6809    126294  NaN
5   6809    126295  -1.0
6   5341    126293  NaN
7   5341    126294  0.0
8   5341    126295  NaN

现在unstack它回来了

df1.set_index(['level_1','level_0']).unstack()

输出：


        v
level_0 5341    6807    6809
level_1         
126293  NaN     1.0     NaN
126294  0.0     NaN     NaN
126295  NaN     NaN    -1.0

你会在 pro con 或 neu 中没有投票的地方得到 NaN。那些 dfs 中最初不存在的 file_id/name_id 的投票df被忽略

python - 通过在其他 DataFrame 中搜索索引和列名来填充矩阵

2 回答 2

Related

Reference