python - Categorizing points using known distributions

Question

My problem is as follows:

I am given a number of chi-squared values for the same collection of data sets, fitted with different models. (so, for example, for 5 collections of points, fitted with either a single binomial distribution, or both binomial and normal distributions, I would have 10 chi-squared values).

I would like to use machine learning categorization to categorize the data sets into "models":

e.g. data sets (1,2,5 and 7) are best fitted using only binomial distributions, whereas sets (3,4,6,8,9,10) - using normal distribution as well.

Notably, the number of degrees of freedom is likely to be different for both chi-squared distributions and is always known, as is the number of models.

My (probably) naive guess for a solution would be as follows:

Randomly distribute the points (10 chi-squared values in this case) into the number of categories (2).
Fit each of the categories using the particular chi-squared distributions (in this case with different numbers of degrees of freedom)
Move outlying points from one distribution to the next.
Repeat steps 2 and 3 until happy with result.

However I don't know how I would select the outlying points, or, for that matter, if there already is an algorithm that does it.

I am extremely new to machine learning and fairly new to statistics, so any relevant keywords would be appreciated too.

score 0 · Accepted Answer

执行此操作的原则方法是将概率分配给不同的模型类型和模型类型中的不同参数。寻找“贝叶斯模型估计”。

python - Categorizing points using known distributions

1 回答 1

Related

Reference