在此异常检测示例中:IsolationForest
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
rng = np.random.RandomState(42)
# Generate train data
X = 0.3 * rng.randn(100, 2)
X_train = np.r_[X + 2, X - 2]
# Generate some regular novel observations
X = 0.3 * rng.randn(20, 2)
X_test = np.r_[X + 2, X - 2]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
# fit the model
clf = IsolationForest(max_samples=100, random_state=rng)
clf.fit(X_train)
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)
我相信这段代码中的异常值是随机引入的。但是,如果我使用真实数据进行异常检测,那么:
我该如何推进这件事?
如果我已经有数据集,如何识别异常?我正在尝试使用联合循环发电厂数据集。或者,如果您有任何其他好的异常检测实践数据集,请删除一些链接!