我想建立一个使用 GridsearchCV 进行参数调整的管道。我的模型是一个(二进制)分类器,我使用 Keras Sequential() 构建了它。
由于我正在处理一个倾斜数据集(大约 6/7 个标签为 0,剩下的 1/7 部分数据集用 1 标记)我添加了一个回调,用于计算每个时期结束时的 f1、召回率和精度指标,我想用它作为指标来验证我的模型。
为此,我在 Keras 中使用了一个回调,它需要在我的模型的 fit() 实例中指定我的验证数据集。反过来,这使得访问验证集和使用 GridsearchCV 变得非常困难。
我设法通过构建一种 DIY cv 程序来克服这个问题,但我想知道这是否可以结合 GridsearchCV 更有效地实现。这是我的代码:
设置不同的分类阈值
INPUT:x,NN 学习产生的预测向量 thr,我们用来将预测分类为 0 或 1 类的阈值。OUTPUT:包含 0 和 1 的向量,预测消息的标签
def pred_round(x,thr):
x=np.array(x)
if 0<thr<1:
return 1*(x >thr)
然后我为我的指标创建回调:
class Metrics(Callback):
def on_train_begin(self, logs={}):
self.val_f1s = []
self.val_recalls = []
self.val_precisions = []
def on_epoch_end(self, epoch, logs={}):
val_predict = pred_round(self.model.predict(self.validation_data[0]), threshold)
val_targ = self.validation_data[1]
_val_precision, _val_recall, _val_f1, dummy = precision_recall_fscore_support(val_targ, val_predict,beta = 1.0,average = 'binary')
self.val_f1s.append(_val_f1)
self.val_recalls.append(_val_recall)
self.val_precisions.append(_val_precision)
metrics = Metrics()
然后我使用以下方法创建模型:
def create_model(optimizer="adam", dropout=0.1, init='uniform'):
model = Sequential()
model.add(Dense(1,input_shape=(N_FEATURES,), kernel_initializer=init,))
model.add(Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = OPTIMIZER,
#metrics=['accuracy','binary_accuracy']
)
return model
model = KerasClassifier(build_fn=create_model, verbose=1)
N_FEATURES = X_train.shape[1]
thresholds = [0.2,0.3,0.5] #0.15
EPOCHS =200
BATCH_SIZE = 256
VERBOSE = 1
OPTIMIZER = Adadelta()
N_HIDDEN = 2000
cv_repetitions = 5
现在我希望优化的部分代码:
for threshold in thresholds:
i=0
f1_cv_scores = np.zeros(EPOCHS)
recall_cv_scores = np.zeros(EPOCHS)
precision_cv_scores = np.zeros(EPOCHS)
loss_train_scores = np.zeros(EPOCHS)
loss_cv_scores = np.zeros(EPOCHS)
for i in range(cv_repetitions):
i+=1
X_train_NN, X_val_NN, y_train_NN, y_val_NN = train_test_split(X_train,y_train,test_size= 0.2,stratify = y_train)
history = model.fit( x= X_train_NN, y= y_train_NN,
batch_size = BATCH_SIZE,
validation_data = (X_val_NN,y_val_NN),
epochs = EPOCHS,
verbose = VERBOSE,
callbacks = [metrics]
)
loss_train_scores += history.history['loss']
loss_cv_scores += history.history['val_loss']
f1_cv_scores += metrics.val_f1s
recall_cv_scores += metrics.val_recalls
precision_cv_scores += metrics.val_precisions
loss_train_scores = loss_test_scores/cv_repetitions
loss_cv_scores = loss_cv_scores/cv_repetitions
f1_cv_scores = f1_cv_scores /cv_repetitions
recall_cv_scores = recall_cv_scores/cv_repetitions
precision_cv_scores = precision_cv_scores/cv_repetitions
有没有办法使用 GridsearchCV 封闭这些循环,并可能包含更多交叉验证部分的参数?
提前感谢您的阅读和帮助。