python - mlextend plot_decision_regions 模型适合 Pandas DataFrame？

Question

我是 mlxtend功能的忠实粉丝plot_decision_regions（http://rasbt.github.io/mlxtend/#examples，https://stackoverflow.com/a/43298736/1870832）

它接受一个X（一次只有两列）y、和（拟合的）分类器clf对象，然后提供模型预测、真实 y 值和一对自变量之间关系的非常棒的可视化。

一些限制： X并且y必须是 numpy 数组，并且clf需要有一个predict()方法。很公平。我的问题是，就我而言，clf我想可视化的分类器对象已经安装在 Pandas DataFrame 上......

import numpy as np
import pandas as pd
import xgboost as xgb

import matplotlib
matplotlib.use('Agg')
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt


# Create arbitrary dataset for example
df = pd.DataFrame({'Planned_End': np.random.uniform(low=-5, high=5, size=50),
                   'Actual_End':  np.random.uniform(low=-1, high=1, size=50),
                   'Late':        np.random.random_integers(low=0,  high=2, size=50)}
)

# Fit a Classifier to the data
# This classifier is fit on the data as a Pandas DataFrame
X = df[['Planned_End', 'Actual_End']]
y = df['Late']

clf = xgb.XGBClassifier()
clf.fit(X, y)

所以现在当我尝试使用plot_decision_regions传递 X/y 作为 numpy 数组时......

# Plot Decision Region using mlxtend's awesome plotting function
plot_decision_regions(X=X.values,
                      y=y.values,
                      clf=clf,
                      legend=2)

我（可以理解）得到一个错误，模型找不到它所训练的数据集的列名

ValueError: feature_names mismatch: ['Planned_End', 'Actual_End'] ['f0', 'f1']
expected Planned_End, Actual_End in input data
training data did not have the following fields: f1, f0

在我的实际情况中，避免在 Pandas DataFrames 上训练我们的模型是一件大事。有没有办法为decision_regions在 Pandas DataFrame 上训练的分类器生成绘图？

score 0 · Accepted Answer

尝试改变：

X = df[['Planned_End', 'Actual_End']].values
y = df['Late'].values

并继续：

clf = xgb.XGBClassifier()
clf.fit(X, y)

plot_decision_regions(X=X,
                      y=y,
                      clf=clf,
                      legend=2)

或fit & plot使用X.values和y.values

python - mlextend plot_decision_regions 模型适合 Pandas DataFrame？

1 回答 1

Related

Reference