matlab - 如何提高matlab中决策树的准确性

Question

我有一组数据，我使用决策树在 matlab 中对它们进行分类。我把套装分成两部分；一个训练数据（85%）和另一个测试数据（15%）。问题是准确度在 %90 左右，我不知道如何改进它。如果您对此有任何想法，我将不胜感激。

score 5 · Accepted Answer

由于许多原因，决策树可能表现不佳，我能想到的一个突出原因是，在计算拆分时，他们没有考虑变量或目标变量对其他变量的相互依赖性。在开始改进性能之前，应该知道它不会导致过度拟合，并且应该能够泛化。

为了提高性能，可以做以下几件事：

变量预选：可以对变量进行多重共线性检验、VIF 计算、IV 计算等不同的测试，以仅选择几个顶级变量。这将导致性能提高，因为它会严格消除不需要的变量。
集成学习使用多棵树（随机森林）来预测结果。随机森林通常比单个决策树表现得更好，因为它们设法减少偏差和方差。它们也不太容易过度拟合。
K-Fold 交叉验证：训练数据本身的交叉验证可以稍微提高模型的性能。
混合模型：使用混合模型，即在使用决策树后使用逻辑回归来提高性能。

score 3 · Accepted Answer

I guess the more important question here is what's a good accuracy for the given domain: if you're classifying spam then 90% might be a bit low, but if you're predicting stock prices then 90% is really high!

If you're doing this on a known domain set and there are previous examples of classification accuracy which is higher than yours, then you can try several things:

score 1 · Accepted Answer

I don't think you should improve this, may be the data is overfitted by the classifier. Try to use another data sets, or cross-validation to see the more accurate result.

By the way, 90%, if not overfitted, is great result, may be you even don't need to improve it.

score 0 · Accepted Answer

90% 是好是坏，取决于数据的领域。

但是，您的数据中的类可能是重叠的，您实际上不能超过 90%。

您可以尝试查看哪些节点是错误的，并检查是否可以通过更改它们来改进分类。

您也可以尝试随机森林。

score 0 · Accepted Answer

您可以考虑修剪叶子以提高决策树的泛化能力。但如前所述，90% 的准确率可以认为是相当不错的。

matlab - 如何提高matlab中决策树的准确性

5 回答 5

Related

Reference