我尝试了它并且它有效,但我发现一些列应该在那里但丢失了。为什么他们缺少 a 以及如何解决它?
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100463 entries, 0 to 100462
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 100463 non-null int64
1 Assigned_ID 100463 non-null int64
2 Outbreak Associated 100463 non-null object
3 Age Group 100463 non-null object
4 Neighbourhood Name 100463 non-null object
5 FSA 100463 non-null object
6 Source of Infection 100463 non-null object
7 Classification 100463 non-null object
8 Episode Date 100463 non-null object
9 Reported Date 100463 non-null object
10 Client Gender 100463 non-null object
11 Outcome 100463 non-null object
12 Currently Hospitalized 100463 non-null object
13 Currently in ICU 100463 non-null object
14 Currently Intubated 100463 non-null object
15 Ever Hospitalized 100463 non-null object
16 Ever in ICU 100463 non-null object
17 Ever Intubated 100463 non-null object
dtypes: int64(2), object(16)
memory usage: 13.8+ MB
import category_encoders as ce
encoder = ce.BackwardDifferenceEncoder(cols=['Source_Of_Infection'])
df_bd = encoder.fit_transform(covid_df)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100463 entries, 0 to 100462
Data columns (total 59 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 intercept 100463 non-null int64
1 Assigned_Id 100463 non-null int64
2 Outbreak_Associated 100463 non-null object
3 Age_Group 100463 non-null object
4 Neighbourhood_Name 100463 non-null object
5 FSA 100463 non-null object
6 Source_Of_Infection_0 100463 non-null float64
7 Source_Of_Infection_1 100463 non-null float64
8 Source_Of_Infection_2 100463 non-null float64
9 Source_Of_Infection_3 100463 non-null float64
10 Source_Of_Infection_4 100463 non-null float64
11 Source_Of_Infection_5 100463 non-null float64
12 Source_Of_Infection_6 100463 non-null float64
13 Source_Of_Infection_7 100463 non-null float64
14 Classification 100463 non-null object
15 Episode_Date 100463 non-null datetime64[ns]
16 Reported_Date 100463 non-null datetime64[ns]
17 Gender 100463 non-null object
18 Outcome 100463 non-null object
19 Currently_Hospitalized 100463 non-null object
20 Currently_ICU 100463 non-null object
21 Currently_Intubated 100463 non-null object
22 Ever_Hospitalized 100463 non-null object
23 Ever_ICU 100463 non-null object
24 Ever_Intubated 100463 non-null object
25 Outbreak_Outbreak_Associated 100463 non-null uint8
26 Outbreak_Sporadic 100463 non-null uint8
27 Source_Close_Contact 100463 non-null uint8
28 Source_Community 100463 non-null uint8
29 Source_Household_Contact 100463 non-null uint8
30 Source_No_Information 100463 non-null uint8
31 Source_Congregate_Settings 100463 non-null uint8
32 Source_Healthcare_Institutions 100463 non-null uint8
33 Source_Other_Settings 100463 non-null uint8
34 Source_Pending 100463 non-null uint8
35 Source_Travel 100463 non-null uint8
36 Classification_Confirmed 100463 non-null uint8
37 Classification_Probable 100463 non-null uint8
38 Gender_Female 100463 non-null uint8
39 Gender_Male 100463 non-null uint8
40 NON-BINARY 100463 non-null uint8
41 Gender_Other 100463 non-null uint8
42 Gender_Transgender 100463 non-null uint8
43 Gender_Unknown 100463 non-null uint8
44 Outcome_Active 100463 non-null uint8
45 Outcome_Fatal 100463 non-null uint8
46 Outcome_Resolved 100463 non-null uint8
47 Currently_Hospitalized_No 100463 non-null uint8
48 Currently_Hospitalized_Yes 100463 non-null uint8
49 Currently_ICU_No 100463 non-null uint8
50 Currently_ICU_Yes 100463 non-null uint8
51 Currently_Intubated_No 100463 non-null uint8
52 Currently_Intubated_Yes 100463 non-null uint8
53 Ever_Hospitalized_No 100463 non-null uint8
54 Ever_Hospitalized_Yes 100463 non-null uint8
55 Ever_ICU_No 100463 non-null uint8
56 Ever_ICU_Yes 100463 non-null uint8
57 Ever_Intubated_No 100463 non-null uint8
58 Ever_Intubated_Yes 100463 non-null uint8
dtypes: datetime64[ns](2), float64(8), int64(2), object(13), uint8(34)
memory usage: 22.4+ MB
结果列最初有 3 个值 RESOLVED、FATAL、RECOVERED 在我执行反向技术并看到结果后,结果列变成了 2 列而不是 3 列。为什么会发生这种情况以及如何解决?