0

我正在尝试使用Yellowbrick PredictionError并且遇到了奇怪的维度问题。我正在使用黄砖版本 1.4。

假设我们有这个非常简单的线性回归:

import pandas as pd 
import numpy as np
import matplotlib as plt

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from yellowbrick.regressor import PredictionError, ResidualsPlot

X = pd.DataFrame({
    "x1": np.linspace(1, 1000, 800),
    "x2": np.linspace(2, 500, 800),
    "x3": np.random.rand(800) * 50
})
y = pd.DataFrame().assign(y_val = 3 * X.x1 + 4 * X.x2 + X.x3)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

现在我想运行诊断。ResidualsPlot很容易工作,传入未经修改的 Pandas 数据结构:

rp = ResidualsPlot(model)
rp.fit(X_train, y_train)
rp.score(X_test, y_test)
rp.show()
# produces graphic (not shown)

但是,当我尝试使用PredictionError时:

pe = PredictionError(model)
pe.fit(X_train, y_train)
pe.score(X_test, y_test)

调用score()产生此错误消息:

File ~/venv/lib/python3.9/site-packages/yellowbrick/bestfit.py:141, in draw_best_fit(X, y, ax, estimator, **kwargs)
    139 # Verify that y is a (n,) dimensional array
    140 if y.ndim > 1:
--> 141     raise YellowbrickValueError(
    142         "y must be a (1,) dimensional array not {}".format(y.shape)
    143     )
    145 # Uses the estimator to fit the data and get the model back.
    146 model = estimator(X, y)

YellowbrickValueError: y must be a (1,) dimensional array not (264, 1)

现在我意识到yis的类型DataFrame。如果我将其更改为Series,则代码将起作用,例如:

# Same as before, for reference
y = pd.DataFrame().assign(y_val= 3 * X.x1 + 4 * X.x2 + X.x3)

# Change to Series here
y = y["y_val"] 

转换为Series当然是一种可行的解决方法,但我想知道为什么这里是这种情况而不是ResidualsPlot.

4

0 回答 0