0

数据集如下

,store id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583

代码如下

import pandas as pd
import numpy as np
import pylab
from sklearn.preprocessing import StandardScaler
from pylab import rcParams

df = pd.read_csv(r'data.csv',header=None,sep=',')
df.columns = df.columns.str.replace(' ', '')
dummies = pd.get_dummies(data = df)
del dummies['Unnamed:0']
store = dummies[['storeid']]
test = dummies[['profit']]
qv1 = test[param].quantile(0.25)
qv2 = test[param].quantile(0.5)
qv3 = test[param].quantile(0.75)
qv_limit = 1.5 * (qv3 - qv1)
qv_limit,qv3,qv1
#(688855.5, 776026.0, 316789.0)
un_outliers_mask = (test[param] > qv3 + qv_limit) | (test[param] < qv1 - qv_limit)
un_outliers_data = test[param][un_outliers_mask]
un_outliers_name = store[un_outliers_mask]  
un_outliers_data

的输出un_outliers_dataSeries([], Name: profit, dtype: int64)。有些点是异常值,如您所见1615461>(776026.0 + 688855.5)

4

0 回答 0