这是我可视化树的方式:
完成所有预处理、拆分等后,首先制作模型:
# max number of trees = 100
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
作出预测:
# Predicting the Test set results
y_pred = classifier.predict(X_test)
然后绘制重要性图。变量dataset
是原始数据框的名称。
# get importances from RF
importances = classifier.feature_importances_
# then sort them descending
indices = np.argsort(importances)
# get the features from the original data set
features = dataset.columns[0:26]
# plot them with a horizontal bar chart
plt.figure(1)
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
plt.yticks(range(len(indices)), features[indices])
plt.xlabel('Relative Importance')
这会产生如下图: