0

我曾经gridsearchcv对训练数据集的参数进行过调优KNearestNeighbors,但令人惊讶的是,它返回的结果比测试集上的默认参数更差。为什么会发生这种情况?任何对适当使用的见解gridsearchcv将不胜感激,我需要在几种算法上执行此操作,以将默认结果与超调结果进行比较。

gridsearchcv代码:

    # Parameters we want to try
    param_grid = {'n_neighbors': [1, 2, 3, 5, 7],
                  'weights': ['uniform', 'distance'],
                  'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
                  'leaf_size': [20, 30, 40],
                  'p': [1, 2, 3],
                  'metric': ['minkowski', 'chebyshev', 'manhattan', 'euclidean']} 
    # Define the grid search we want to run. Run it with four cpus in parallel.
    gs_cv = GridSearchCV(KNeighborsClassifier(), param_grid, n_jobs=4)

    # Run the grid search (should only be on training data!)
    gs_cv.fit(train_X, train_y)

    # Print the best parameters
    print(gs_cv.best_params_)

    #{'algorithm': 'auto', 'leaf_size': 20, 'metric': 'minkowski', 'n_neighbors': 7, 'p': 1, 'weights': 'uniform'}

使用这些参数的结果:

    knn = KNeighborsClassifier(n_neighbors=7,
                               weights='uniform',
                               algorithm='auto',
                               leaf_size=20,
                               p=1,
                               metric='minkowski')

    knn.fit(train_X, train_y)


    print("="*30)

    print('****Results****')
    train_predictions = knn.predict(test_X)
    acc = accuracy_score(test_y, train_predictions)
    print("Accuracy: {:.2%}".format(acc))

    train_predictions = knn.predict_proba(test_X)
    ll = log_loss(test_y, train_predictions, labels=np.unique(train_y))
    print("Log Loss: {:.4}".format(ll))

    log_entry = pd.DataFrame([[name, acc*100, ll]], columns=log_cols)
    log = log.append(log_entry)
    ==============================
    ****Results****
    Accuracy: 87.50%
    Log Loss: 0.3354

使用默认 KNN 参数:

    knn = KNeighborsClassifier()

    knn.fit(train_X, train_y)


    print("="*30)

    print('****Results****')
    train_predictions = knn.predict(test_X)
    acc = accuracy_score(test_y, train_predictions)
    print("Accuracy: {:.2%}".format(acc))

    train_predictions = knn.predict_proba(test_X)
    ll = log_loss(test_y, train_predictions, labels=np.unique(train_y))
    print("Log Loss: {:.4}".format(ll))

    log_entry = pd.DataFrame([[name, acc*100, ll]], columns=log_cols)
    log = log.append(log_entry)
    ==============================
    ****Results****
    Accuracy: 91.67%
    Log Loss: 0.2398
4

0 回答 0