python - 如何使用行中单元格的值来选择在熊猫数据框中查找列名？

Question

我有一个看起来像这样的数据框

   Index  Variable1  Value1  Variable2  Value2  Cat  Dog  Cow
    1      Cat        7       Sheep      7       0    0    0
    2      Sheep      2       Cat        6       0    0    0
    3      Cow        3       Dog        2       0    0    0

当变量列等于适当的列名时，如何使用值列中的值有效地填充 Cat、Dog 和 Cow 列？所以它看起来像这样

Index  Variable1  Value1  Variable2  Value2  Cat  Dog  Cow
1      Cat        7       Sheep      7       7    0    0
2      Sheep      2       Cat        6       6    0    0
3      Cow        3       Dog        2       0    2    3

我创建了一个嵌套的 for 循环，该循环遍历每个“变量”列，然后该列中的每一行根据该单元格中的值填充每个动物的数据。但我 100% 确定这是一种不好的做法。

score 1 · Accepted Answer

#create variables
vals = df.filter(like = 'Value').columns
variables = df.filter(like = 'Variable').columns
animals = df.iloc[:,-3:].columns

#lump all 'Variable_' and 'Value_' into one df
res = pd.concat(df.filter(ent).set_axis(['val','var'],axis=1) for ent in zip(vals,variables))
res

    val var
0   7   Cat
1   2   Sheep
2   3   Cow
0   7   Sheep
1   6   Cat
2   2   Dog

#pivot res
out = (res
       .pivot(columns='var',values='val')
       .fillna(0)
       .astype(int)
       .filter(animals)
      )
out


var Cat Dog Cow
0   7   0   0
1   6   0   0
2   0   2   3

#final result
result = pd.concat([df.iloc[:,:-3],out],axis=1)
result


  Index Variable1   Value1  Variable2   Value2  Cat Dog Cow
0   1     Cat         7      Sheep       7       7   0   0
1   2     Sheep       2      Cat         6       6   0   0
2   3     Cow         3      Dog         2        0  2   3

score 0 · Accepted Answer

一个好的解决方案是矢量化操作，它通常比循环更快。NumPy 在 np.where 中独树一帜：

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'V1': ['Cat', 'Sheep', 'Cow'],
"Va1":[7, 2, 3], "v2": ['Sheep','Cat','Dog'], 'va2':[7,6,2]})

df['Cat'] = np.where(df['V1'] == 'Cat', df['Va1'], np.where(df['v2'] == 'Cat', df['va2'], 0))
df['Dog'] = np.where(df['V1'] == 'Dog', df['Va1'], np.where(df['v2'] == 'Dog', df['va2'], 0))
df['Cow'] = np.where(df['V1'] == 'Cow', df['Va1'], np.where(df['v2'] == 'Cow', df['va2'], 0))

基本上，它检查变量 1 是否有问题的动物，并在匹配填充值 1 时，否则它对变量和值 2 进行相同的检查，否则填充 0。

score 0 · Accepted Answer

用于wide_to_longreshape，然后按 list by 过滤值DataFrame.query， reshape bySeries.unstack和 last use DataFrame.update：

df = pd.DataFrame({'Variable1': ['Cat', 'Sheep', 'Cow'],
                   "Value1":[7, 2, 3], 
                   "Variable2": ['Sheep','Cat','Dog'], 
                   'Value2':[7,6,2],
                   'Cat':[0,0,0],
                   'Dog':[0,0,0],
                   'Cow':[0,0,0]}, index=[1,2,3])

L = ['Cat','Dog','Cow']
#or if possible select last 3 column names
#L = df.columns[-3:]
df1 = (pd.wide_to_long(df.reset_index(), ['Variable','Value'],i='index', j='tmp')
        .reset_index(level=1, drop=True)
        .query("Variable in @L")
        .set_index('Variable', append=True)['Value']
        .unstack(fill_value=0))
print (df1)
Variable  Cat  Cow  Dog
index                  
1           7    0    0
2           6    0    0
3           0    3    2

df.update(df1)
print (df)
  Variable1  Value1 Variable2  Value2  Cat  Dog  Cow
1       Cat       7     Sheep       7    7    0    0
2     Sheep       2       Cat       6    6    0    0
3       Cow       3       Dog       2    0    2    3

python - 如何使用行中单元格的值来选择在熊猫数据框中查找列名？

3 回答 3

Related

Reference