python - Python SKlearn 污染必须在 (0, 0.5] 错误

Question

我是机器学习的新手，正在使用 Python(3.6)、Pandas、Numpy 和 SKlearn 开展项目。我已经完成了分类和重塑，但是在预测时它会抛出一个错误contamination must be in (0, 0.5]。

这是我尝试过的：

# Determine no of fraud cases in dataset
Fraud = data[data['Class'] == 1]
Valid = data[data['Class'] == 0]

# calculate percentages for Fraud & Valid 
outlier_fraction = len(Fraud) / float(len(Valid))
print(outlier_fraction)

print('Fraud Cases : {}'.format(len(Fraud)))
print('Valid Cases : {}'.format(len(Valid)))
# Get all the columns from dataframe
columns = data.columns.tolist()

# Filter the columns to remove data we don't want
columns = [c for c in columns if c not in ["Class"] ]

# store the variables we want to predicting on
target = "Class"
X = data.drop(target, 1)
Y = data[target]

# Print the shapes of X & Y
print(X.shape)
print(Y.shape)

# define a random state
state = 1

# define the outlier detection method
classifiers = {
    "Isolation Forest": IsolationForest(max_samples=len(X),
                                       contamination=outlier_fraction,
                                       random_state=state),
    "Local Outlier Factor": LocalOutlierFactor(
    contamination = outlier_fraction)
}
# fit the model
n_outliers = len(Fraud)

for i, (clf_name, clf) in enumerate(classifiers.items()):

    # fit te data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)

    # Reshape the prediction values to 0 for valid and 1 for fraudulent
    y_pred[y_pred == 1] = 0
    y_pred[y_pred == -1] = 1

    n_errors = (y_pred != Y).sum()

    # run classification metrics 
    print('{}:{}'.format(clf_name, n_errors))
    print(accuracy_score(Y, y_pred ))
    print(classification_report(Y, y_pred ))

这是它返回的内容：

ValueError: 污染必须在 (0, 0.5]

y_pred = clf.predict(X)正如 Traceback 中所指出的，它会为 line 抛出此错误。

我是机器学习的新手，对**污染**不太了解，所以我哪里做错了？

请帮帮我！

提前致谢！

score 2 · Accepted Answer

ValueError: 污染必须在 (0, 0.5]

这意味着contamination必须严格大于 0.0 且小于或等于 0.5。（这个方括号和圆括号符号是什么意思[first1，last1）？关于括号符号是一个很好的问题）正如您所评论的，print(outlier_fraction)输出 0.0，问题在于您发布的代码的前 6 行。

score 0 · Accepted Answer

LocalOutlierFactor是本文介绍的一种无监督异常值检测算法。每个算法都有自己的参数，这些参数真正改变了算法的行为。在应用该方法之前，您应该始终研究这些参数及其对算法的影响，否则您可能会迷失在大量参数选项的土地上。

在的情况下LocalOutlierFactor，它假设您的异常值不超过数据集的一半。在实践中，我会说，即使异常值占您数据集的 30%，它们也不再是异常值。它们只是不同的类型或数据类别。

另一方面，如果你告诉它你有0异常值，你就不能指望异常值检测算法起作用，如果outlier_fraction实际上是0.

python - Python SKlearn 污染必须在 (0, 0.5] 错误

2 回答 2

Related

Reference