KFold 返回一个生成器:
from sklearn.model_selection import KFold
import numpy as np
X = np.random.normal(0,1,(10,5))
kfold = KFold(n_splits=10, shuffle=True, random_state=123)
type(kfold.split(X_train))
generator
它旨在用于 for 循环,您可以遍历它将遍历的内容,例如在第一次迭代时:
list(kfold.split(X_train))[0]
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 69, 70, 71, 74]),
array([23, 24, 42, 43, 45, 68, 72, 73]))
所以上面的内容只有在 n_splits=2 时才有效,因为数据结构是单个元素的结构。
如果您真的想收集所有索引,那么我认为这是一种方法:
train_idx = []
val_idx = []
for x,y in kfold.split(X_train):
train_idx.append(x)
val_idx.append(y)