1

我想有选择地将列值更改为 np.nan。

我有一列有很多零 (0) 值。

我正在获取总数子集的行索引。

我将索引放入变量(s0)中。

然后,我使用它为索引在 s0 中的行将列值设置为 np.nan。

它运行,但它正在将每一行(即整列)更改为 np.nan。

这是我的代码:

print((df3['amount_tsh'] == 0).sum())  # 41639  <-- there are this many zeros to start
# print(df3['amount_tsh'].value_counts()[0])
s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index  #  grab 37322 row indexes
print(len(s0))  # 37322
df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan  #  change the value in the column to np.nan if it's index is in s0
print(df3['amount_tsh'].isnull().sum())
4

2 回答 2

0

我们试试看

s0 = df3.loc[df3['amount_tsh'].eq(0), ['amount_tsh']].sample(37322)
df3.loc[df3.index.isin(s0.index), 'amount_tsh'] = np.nan

为了快速修复,我使用了我在笔记本中的这些数据,它对我有用

import pandas as pd 
import numpy as np

data = pd.DataFrame({'Symbol': {0: 'ABNB', 1: 'DKNG', 2: 'EXPE', 3: 'MPNGF', 4: 'RDFN', 5: 'ROKU', 6: 'VIACA', 7: 'Z'},
'Number of Buys': {0: np.nan, 1: 2.0, 2: np.nan, 3: 1.0, 4: 2.0, 5: 1.0, 6: 1.0, 7: np.nan}, 
'Number of Sell      s': {0: 1.0, 1: np.nan, 2: 1.0, 3: np.nan, 4: np.nan, 5: np.nan, 6: np.nan, 7: 1.0}, 
'Gains/Losses': {0: 2106.0, 1: -1479.2, 2: 1863.18, 3: -1980.0, 4: -1687.7, 5: -1520.52, 6: -1282.4, 7: 1624.59}, 'Percentage change': {0: 0.0, 1: 2.0, 2: 0.0, 3: 0.0, 4: 1.5, 5: 0.0, 6: 0.0, 7: 0.0}})

rows = ['ABNB','DKNG','EXPE']
data


  Symbol  Number of Buys  Number of Sell      s  Gains/Losses  \
0   ABNB             NaN                    1.0       2106.00   
1   DKNG             2.0                    NaN      -1479.20   
2   EXPE             NaN                    1.0       1863.18   
3  MPNGF             1.0                    NaN      -1980.00   
4   RDFN             2.0                    NaN      -1687.70   
5   ROKU             1.0                    NaN      -1520.52   
6  VIACA             1.0                    NaN      -1282.40   
7      Z             NaN                    1.0       1624.59   

   Percentage change  
0                0.0  
1                2.0  
2                0.0  
3                0.0  
4                1.5  
5                0.0  
6                0.0  
7                0.0 

通过你的方法

(data['Number of Buys']==1.0).sum()
s0= data.loc[(data['Number of Buys']==1.0),['Number of Buys']].sample(2)
data.loc[data.index.isin(s0.index),'Number of Buys'] =np.nan

Symbol  Number of Buys  Number of Sell      s  Gains/Losses  \
0   ABNB             NaN                    1.0       2106.00   
1   DKNG             2.0                    NaN      -1479.20   
2   EXPE             NaN                    1.0       1863.18   
3  MPNGF             1.0                    NaN      -1980.00   
4   RDFN             2.0                    NaN      -1687.70   
5   ROKU             NaN                    NaN      -1520.52   
6  VIACA             NaN                    NaN      -1282.40   
7      Z             NaN                    1.0       1624.59   

   Percentage change  
0                0.0  
1                2.0  
2                0.0  
3                0.0  
4                1.5  
5                0.0  
6                0.0  
7                0.0  
于 2022-03-04T21:25:32.833 回答
0

唔...

我删除了重新分配并且它有效?

s0 = df3['amount_tsh'][df3['amount_tsh'].eq(0)].sample(37322).index df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan

第二行是: df3['amount_tsh'] = df3.loc[df3.index.isin(s0), 'amount_tsh'] = np.nan

于 2022-03-04T21:35:10.063 回答