python - 使用具有特定集群的 sklearn 进行 K 折叠，而不是使用特定大小进行拆分

Question

我想在 python 中使用 sklearn 进行 K 折交叉验证。我的数据有 8 个用户，我只对一个用户的数据进行 K 折。是否可以在用户之间进行交叉验证？例如使用7 个用户作为训练集，1 个用户作为测试集，并在这 8 个不同的场合这样做？

score 1 · Accepted Answer

是的，这是可能的。为此，您可以对组使用交叉验证。如果您想确保一个人的数据点在训练集中或测试集中，这称为分组或阻塞。在 scikit-learn 中，可以通过将具有组成员值的数组添加到cross_val_scores. 然后您可以使用GroupKFold具有组数的 scikit-learn 类作为交叉验证程序。请参见下面的示例。（简单的逻辑回归模型只是为了说明 GroupKFold 类的用法）

from sklearn.model_selection import GroupKFold
# create synthetic dataset
X, y = make_blobs(n_samples=12, random_state=0)

# the first three samples belong to the same group, etc.
groups = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]

scores = cross_val_score(logreg, X, y, groups, cv=GroupKFold(n_splits=4))

print("cross_val_score(logreg, X, y, groups, cv=GroupKFold(n_splits=4)")
print("Cross-validation scores :\n{}".format(scores))

python - 使用具有特定集群的 sklearn 进行 K 折叠，而不是使用特定大小进行拆分

1 回答 1

Related

Reference