我有一个运行预处理的管道,然后是来自 SciKit-Survival 包的随机生存森林。我正在尝试使用此处找到的 Scikit-Survival 的 as_concordance_index_ipcw_scorer()类。
我的管道如下所示:
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')),
('scaler',
StandardScaler())]),
Index(['IntVar1', 'IntVar2', 'IntVar3',
'IntVar4'],
dtype='object')),
('cat',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Index(['CharVar1', 'CharVar2', 'CharVar3'], dtype='object'))])),
('randomsurvivalforest',
RandomSurvivalForest(max_features='sqrt',
min_samples_leaf=0.005,
min_samples_split=0.01, n_estimators=150,
n_jobs=-1, oob_score=True,
random_state=200))])
这是导致管道和管道安装的python代码:
print("Importing global DF")
print("Creating X & Y set")
X = df.iloc[:,:-2].copy()
y = Surv.from_dataframe("AliveStatus","Target_Age",df.iloc[:,-2:].copy()) ## Creates structured array for Scikit Surv
print("Defining feature categories by data type")
numerical_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
print("Splitting dataset")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5) #SkLearn splitter
print("Defining preprocessing steps using SciKitLearn pipeline...")
## Pipeline Steps
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(sparse=False,handle_unknown='ignore'))]) ## Use "sparse=False" because Random Forests cannot take Spare Matrixes, only Dense Matrixes.
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)])
## Pipeline defining
print("Defining Random Survival Forest pipeline from SciKit Survival")
rsf = make_pipeline(
preprocessor,
RandomSurvivalForest(n_estimators=150, ## Default 100
min_samples_split=0.01, ## Default 6
min_samples_leaf=0.005, ## Default 3
max_features="sqrt", ## Defaults to none when not defined
n_jobs=-1, ## Default -1
oob_score = True,
random_state=200) ## Random State 2020
)
##Fitting & Scoring
print("Fitting dataframe to RSF Pipeline")
rsf.fit(X_train,y_train)
print("Fitting completed.")
拟合完成后,我尝试运行以下命令:
as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
之后我收到以下错误:
AttributeError Traceback (most recent call last)
<ipython-input-97-9a92b22d8026> in <module>
----> 1 as_concordance_index_ipcw_scorer(rsf).score(X_test,y_test)
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in score(self, X, y)
788 score : float
789 """
--> 790 estimate = self._do_predict(X)
791 score = self._score_func(
792 survival_train=self._train_y,
C:\ProgramData\Anaconda3\lib\site-packages\sksurv\metrics.py in _do_predict(self, X)
768
769 def _do_predict(self, X):
--> 770 predict_func = getattr(self.estimator_, self._predict_func)
771 return predict_func(X)
772
AttributeError: 'as_concordance_index_ipcw_scorer' object has no attribute 'estimator_'
我尝试过的一个选项是指定管道的 RSF 部分,但没有成功:
as_concordance_index_ipcw_scorer(rsf[1]).score(X_test,y_test)
有什么建议么?
对长度或缺少信息表示歉意,我是管道和 ScikitSurvival 的新手,想提供尽可能多的细节。
谢谢