3

我想建立一个使用 GridsearchCV 进行参数调整的管道。我的模型是一个(二进制)分类器,我使用 Keras Sequential() 构建了它。

由于我正在处理一个倾斜数据集(大约 6/7 个标签为 0,剩下的 1/7 部分数据集用 1 标记)我添加了一个回调,用于计算每个时期结束时的 f1、召回率和精度指标,我想用它作为指标来验证我的模型。

为此,我在 Keras 中使用了一个回调,它需要在我的模型的 fit() 实例中指定我的验证数据集。反过来,这使得访问验证​​集和使用 GridsearchCV 变得非常困难。

我设法通过构建一种 DIY cv 程序来克服这个问题,但我想知道这是否可以结合 GridsearchCV 更有效地实现。这是我的代码:

设置不同的分类阈值

INPUT:x,NN 学习产生的预测向量 thr,我们用来将预测分类为 0 或 1 类的阈值。OUTPUT:包含 0 和 1 的向量,预测消息的标签

def pred_round(x,thr):
    x=np.array(x)
    if 0<thr<1:
        return 1*(x >thr)

然后我为我的指标创建回调:

class Metrics(Callback):

def on_train_begin(self, logs={}):
    self.val_f1s = []
    self.val_recalls = []
    self.val_precisions = []

def on_epoch_end(self, epoch, logs={}):
    val_predict = pred_round(self.model.predict(self.validation_data[0]), threshold)
    val_targ = self.validation_data[1]
    _val_precision, _val_recall, _val_f1, dummy = precision_recall_fscore_support(val_targ, val_predict,beta = 1.0,average = 'binary')
    self.val_f1s.append(_val_f1)
    self.val_recalls.append(_val_recall)
    self.val_precisions.append(_val_precision)
metrics = Metrics()

然后我使用以下方法创建模型:

def create_model(optimizer="adam", dropout=0.1, init='uniform'):
model = Sequential()
model.add(Dense(1,input_shape=(N_FEATURES,), kernel_initializer=init,))  
model.add(Activation('sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = OPTIMIZER,
            #metrics=['accuracy','binary_accuracy']
            )
return model
model = KerasClassifier(build_fn=create_model, verbose=1)
N_FEATURES = X_train.shape[1]
thresholds = [0.2,0.3,0.5] #0.15
EPOCHS =200
BATCH_SIZE = 256
VERBOSE = 1
OPTIMIZER = Adadelta()
N_HIDDEN = 2000
cv_repetitions = 5

现在我希望优化的部分代码:

for threshold in thresholds:
    i=0
    f1_cv_scores = np.zeros(EPOCHS)
    recall_cv_scores = np.zeros(EPOCHS)
    precision_cv_scores = np.zeros(EPOCHS)
    loss_train_scores = np.zeros(EPOCHS)
    loss_cv_scores = np.zeros(EPOCHS)
    for i in range(cv_repetitions):
        i+=1
        X_train_NN, X_val_NN, y_train_NN, y_val_NN = train_test_split(X_train,y_train,test_size= 0.2,stratify = y_train)


        history = model.fit( x= X_train_NN, y= y_train_NN,
                            batch_size = BATCH_SIZE,
                            validation_data = (X_val_NN,y_val_NN),
                            epochs = EPOCHS,
                            verbose = VERBOSE,
                            callbacks = [metrics]
                            )
        loss_train_scores += history.history['loss']
        loss_cv_scores   += history.history['val_loss']
        f1_cv_scores += metrics.val_f1s
        recall_cv_scores += metrics.val_recalls
        precision_cv_scores += metrics.val_precisions

    loss_train_scores = loss_test_scores/cv_repetitions
    loss_cv_scores   = loss_cv_scores/cv_repetitions
    f1_cv_scores     = f1_cv_scores /cv_repetitions
    recall_cv_scores = recall_cv_scores/cv_repetitions
    precision_cv_scores = precision_cv_scores/cv_repetitions

有没有办法使用 GridsearchCV 封闭这些循环,并可能包含更多交叉验证部分的参数?

提前感谢您的阅读和帮助。

4

0 回答 0