machine-learning - How to categorize continuous data?

Question

I have two dependent continuous variables and i want to use their combined values to predict the value of a third binary variable. How do i go about discretizing/categorizing the values? I am not looking for clustering algorithms, i'm specifically interested in obtaining 'meaningful' discrete categories i can subsequently use in in a Bayesian classifier. Pointers to papers, books, online courses, all very much appreciated!

score 0 · Accepted Answer

这是机器学习的本质，也是研究最多的问题之一。

最小二乘回归、逻辑回归、SVM、随机森林广泛用于这类问题，称为二元分类。

如果您的目标是对数据进行实用分类，可以使用多个库，例如 python 中的 Scikits-learn 和 java 中的 weka。他们有一个很棒的文档。

但是，如果您想了解机器学习的内在本质，只需搜索（此处或在 google 上）机器学习资源。

score 0 · Accepted Answer

如果您想成为真正的书呆子，请生成一堆不同的可能离散化，然后在其上训练分类器，然后通过特征表征离散化，然后在其上运行分类器，看看哪种离散化是最好的！？

一般来说，离散化的东西更像是一门艺术，并且对输入变量范围的含义有很好的理解。

machine-learning - How to categorize continuous data?

2 回答 2

Related

Reference