python - 使用 iloc 进行索引

Question

现在通过 kaggle 教程，虽然我从查看输出和阅读文档中了解了它的基本概念，但我想我需要确认这里发生了什么：

predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

predictions = []

for train, test in kf:
     train_predictors = (titanic[predictors].iloc[train,:])

我的主要问题是iloc函数的最后一行。其余的只是为了上下文。它只是将训练数据拆分？

score 2 · Accepted Answer

.iloc[]是访问row和column索引的主要方法pandas DataFrames（或Series，仅在这种情况下index）。索引文档中对此进行了很好的解释。

在这种特定情况下，来自scikit-learn 文档：

KFold将所有样本分成 k 组样本，称为折叠（如果 k = n，这等效于 Leave One Out 策略），大小相等（如果可能）。使用 k - 1 折学习预测函数，剩下的折用于测试。对具有 4 个样本的数据集进行 2 折交叉验证的示例：
import numpy as np
from sklearn.cross_validation import KFold

kf = KFold(4, n_folds=2)
for train, test in kf:
    print("%s %s" % (train, test)) 
[2 3] [0 1] [0 1] [2 3]

换句话说，KFold选择index位置，这些在for循环中使用kf并传递给，.iloc以便从包含训练集中选择适当的row index（和全部columns）。titanic[predictors] DataFrame

python - 使用 iloc 进行索引

1 回答 1

Related

Reference