5

I would like to compare different machine learning algorithms. As part of that, I need to be able to perform a grid search for optimal hyperparameters. However, I am not really into the idea of writing a separate implementation of a grid search for each fixed algorithm and a fixed subset of its hyperparameters. Instead, I would like it to look more like it does in scikit-learn but perhaps with not as much functionality (I do not need multiple grids, for example) and written in MATLAB.

So far I am trying to understand the logic of the yet to be written grid_search.m

function model = grid_search(algo, data, labels, varargin)
    p = inputParser;
    % here comes the list of all possible hyperparameters for all algorithms
    % I will just leave three for brevity
    addOptional(p, 'kernel_function', {'linear'});
    addOptional(p, 'rbf_sigma', {1});
    addOptional(p, 'C', {1});

    parse(p, algo, data, labels, varargin{:});

    names = fieldnames(p.Results);
    values = struct2cell(p.Results); % a cell array of cell arrays

    argsize = 2 * length(names);
    args = cell(1, argsize);
    args(1 : 2 : argsize) = names;
    % Now this is the stumbling point.
end

The calls to the grid_search function should look something like this:

m = grid_search('svm', data, labels, 'kernel_function', {'rbf'}, 'C', {[0.1], [1], [10]}, 'rbf_sigma', {[1], [2], [3]})
m = grid_search('knn', data, labels, 'NumNeighbors', {[1], [10]}, 'Distance', {'euclidean', 'cosine'})

The first call then would try all the combinations of the rbf kernel with Constraints and Sigmas:

{'rbf', 0.1, 1}
{'rbf', 0.1, 2}
{'rbf', 0.1, 3}
{'rbf', 1, 1}
{'rbf', 1, 2}
{'rbf', 1, 3}
{'rbf', 10, 1}
{'rbf', 10, 2}
{'rbf', 10, 3}

The idea behind the args variable is that it is a cell array of the form {'name1', 'value1', 'name2', 'value2', ..., 'nameN', 'valueN'} which would be later on passed to the corresponding algorithm: algo(data, labels, args{:}). The {'name1', 'name2', ..., 'nameN'} part of it is easy. The problem is that I can't unerstand how to create the {'value1', 'value2', ..., 'valueN'} part on each step.

I understand that machine learning terminology is not known to everybody Which is why below comes a self-contained example:

Suppose the crew of the TARDIS may consist of the following classes of beings:

tardis_crew = {{'doctor'}, {'amy', 'clara'}, {'dalek', 'cyberman', 'master'}}

Since there is always just one place for a Timelord, a Companion and a Villain, please show me how to generate the following cell arrays:

{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'dalek'}
{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'cyberman'}
{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'master'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'dalek'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'cyberman'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'master'}

The solution should be general, i.e. if the number of beings in a class changes or more classes of beings are added, it should still work. I would much appreciate a step-by-step descritption instead of code, too.

PS: The non-stripped github version of the original grid_search.m might give you a better idea of what I mean.

4

1 回答 1

2

看来您想要的是生成任意数量集合的笛卡尔积。我认为这个ALLCOMB函数会为你做到这一点,但如果你想要一个(迭代)算法的细节以便你可以自己实现它,请检查这个答案

编辑:顺便感谢您为没有 ML 知识的人提供一般性的措辞。

于 2013-12-10T13:07:50.487 回答