0

例如,如果一列有 14 个不同的 [Unique Values]value_counts(),并且它们具有共同点,在我们的示例中 [当我们将 'Loan.Purpose' 与 'Interest.Rate' 列分组时,并计算每个 [Unique Values]value_counts() based on Loan.Purpose mean() values],我们得到某些 value_counts 的某个共同平均费率,例如:-('car','educational','major_purchase') 属性的平均值 = 11.0 ,现在我想合并上面提到的 ('car','educational','major_purchase') [Unique Values]value_counts(),在 column_name "LP_cem" 下,因为它们具有相同的意思,同样我想对其他价值计数(),

这样我就可以将虚拟变量的数量从 14 个减少到 4 个。

基本上,我想根据它们的 mean() 将 3/4 列下的 14 个不同 value_counts() 合并,然后从这些 3/4 列中创建虚拟对象

就像下面给出的

LP_cem  LP_chos LP_dm   LP_hmvw LP_renewable_energy
   0         0    0      1      0           0
   1         0    0      1      0           0
   2         0    0      1      0           0
   3         0    0      1      0           0
   4         0    1      0      0           0

raw_data['Loan.Purpose'].value_counts()

debt_consolidation    1306
credit_card            443
other                  200
home_improvement       151
major_purchase         101
small_business          86
car                     50
wedding                 39
medical                 30
moving                  28
vacation                21
house                   20
educational             15
renewable_energy         4
Name: Loan.Purpose, dtype: int64

我已经Loan.Purpose根据平均值对数据进行了合并Interest.Rate

raw_data_8 = round(raw_data_5.groupby('Loan.Purpose')['Interest.Rate'].mean())
raw_data_8

Loan.Purpose
CHOS                  15.0
DM                    12.0
car                   11.0
credit_card           13.0
debt_consolidation    14.0
educational           11.0
home_improvement      12.0
house                 13.0
major_purchase        11.0
medical               12.0
moving                14.0
other                 13.0
renewable_energy      10.0
small_business        13.0
vacation              12.0
wedding               12.0
Name: Interest.Rate, dtype: float64

现在我想将具有相同平均值的值组合在一起,我什至尝试了代码,但它给出了一个错误

for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'

  error = TypeError                                 Traceback (most recent 

call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
      2     if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
      3         raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
      4     if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
      5         raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'

TypeError: 'Int64Index' object cannot be interpreted as an integer


    Interest.Rate   Loan.Length Loan.Purpose
0   8.90                36.0    debt_consolidation
1   12.12               36.0    debt_consolidation
2   21.98               60.0    debt_consolidation
3   9.99                36.0    debt_consolidation
4   11.71               36.0    credit_card
5   15.31               36.0    other
6   7.90                36.0    debt_consolidation
7   17.14               60.0    credit_card
8   14.33               36.0    credit_card
10  19.72               36.0    moving
11  14.27               36.0    debt_consolidation
12  21.67               60.0    debt_consolidation
13  8.90                36.0    debt_consolidation
14  7.62                36.0    debt_consolidation
15  15.65               60.0    debt_consolidation
16  12.12               36.0    debt_consolidation
17  10.37               60.0    debt_consolidation
18  9.76                36.0    credit_card
19  9.99                60.0    debt_consolidation
20  21.98               36.0    debt_consolidation
21  19.05               60.0    credit_card
22  17.99               60.0    car
23  11.99               36.0    credit_card
24  16.82               60.0    vacation
25  7.90                36.0    debt_consolidation
26  14.42               36.0    debt_consolidation
27  15.31               36.0    debt_consolidation
28  8.59                36.0    other
29  7.90                36.0    debt_consolidation
30  21.00               60.0    debt_consolidation
4

0 回答 0