我希望能够根据另一个数据帧的级别对数据帧进行一次热编码。例如,在下面的示例中,数据提供了两个变量的水平。仅基于这些级别,我想在 data2 中创建虚拟变量。
我该怎么办?
import pandas as pd
#unique levels (A,B for VAR1, and X,Y,Z for VAR2) in
#this dataset determine the possible levels for the following dataset
data = {'VAR1': ['A', 'A', 'A', 'A','B', 'B'],
'VAR2': ['X', 'Y', 'Y', 'Y','X', 'Z']}
frame = pd.DataFrame(data)
#data2 contains same variables as data, but might or might not
#contain same levels
data2 = {'VAR1': ['A', 'C'],
'VAR2': ['X', 'Y']}
frame2 = pd.DataFrame(data2)
#after applying one hot encoding to data2, this is what it should look like
data_final = {
'A': ['1', '0'],
'B': ['0', '0'],
'X': ['1', '0'],
'Y': ['0', '1'],
'Z': ['0', '0'],
}
frame_final = pd.DataFrame(data_final)