您好,我是黄昏 Ml 的新手,我一直在尝试使用 dask ml 来训练逻辑回归模型来预测推文情绪。我已将 pandas 数据框转换为 dask 数据框。之后我进行了训练测试拆分。之后我在 X_train 和 X_test 上使用了散列矢量化器。我执行了这条线Train_X_vect.compute().shape
来检查形状,它返回(180224, 7000)
了其他y_train.compute().shape
返回的地方(180224,)
每当我尝试时,为了将它们拟合到逻辑回归模型中,我收到一条错误消息,提示“无法将截距添加到具有未知块的数组”这是我的代码:
from dask_ml.feature_extraction.text import HashingVectorizer
from dask_ml.model_selection import train_test_split
from dask_ml.linear_model import LogisticRegression
dask_df = dd.from_pandas(pandas_df,npartitions=4)
X_train, X_test, y_train, y_test = train_test_split(dask_df ["preprocess"], dask_df ["target"],random_state=42)
vectorizer = HashingVectorizer(n_features=7000)
vectorizer.fit(X_train)
Train_X_vect = vectorizer.transform(X_train)
Test_X_vect = vectorizer.transform(X_test)
lr = LogisticRegression()
lr.fit(Train_X_vect,y_train)
我也使用了“fit_intercept = False”,但随后我会收到此错误:“IndexError: Index dimension must be <= 2”
请你能告诉我我做错了什么,我应该如何解决它?谢谢你,先生