1

我正在使用StratifiedKFold,但我不确定kfold.split下面代码中返回的训练和测试大小是多少。假设Print(array.shape)返回(12904, 47),即行数为 12904,列数为 47,那么训练和测试的大小是多少?

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=8)

for train, validation in kfold.split(X, Y):
            # Fit the model
            model.fit(X[train], Y[train])
            # predict probabilities for training set
            predicted = model.predict(X[train])

            predicted_report = classification_report(Y[train], predicted)
            print(predicted_report)
            # accuracy: (tp + tn) / (p + n)
            accuracy = accuracy_score(Y[train], predicted)#accuracy_score(Y[train], yhat_classes)
4

1 回答 1

2

正如评论中已经暗示的那样,您的训练集大小将是(n_splits-1)/n_splits,而您的验证集大小将1/n_splits是您的初始数据的大小,即分别为 4/5 和 1/5。

这是一个使用 iris 数据和 的简单可重现演示n_splits=5,如您的情况:

import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target
print(X.shape) # initial dataset size
# (150, 4)

kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=8)

for train, validation in kfold.split(X, y):
            print(X[train].shape, X[validation].shape)

结果是:

(120, 4) (30, 4)
(120, 4) (30, 4)
(120, 4) (30, 4)
(120, 4) (30, 4)
(120, 4) (30, 4)

因此,要检查自己的数据,只需print在 for 循环中添加上述语句。

于 2019-10-14T12:30:35.443 回答