1

我在这里发现了一个类似的问题Determining cluster members in SOM (Self Organizing Map) for time series data

并且我想学习如何将自组织地图应用于二值化或将两种以上的符号分配给数据。

例如,让data = rand(100,1)一般来说,我会做data_quantized = 2*(data>=0.5)-1一个二进制值转换序列,其中假设阈值 0.5 并固定。可能已经有可能使用超过 2 个符号来量化数据。可以应用 kmeans 或 SOM 来完成这项任务吗?如果我要使用 SOM 量化数据,输入和输出应该是什么?

X = {x_i(t)}对于 i =1:N 和 t = 1:T 时间序列数,N表示组件/变量的数量。要获得任何向量 x_i 的量化值,就是使用最接近的 BMU 的值。量化误差将是输入向量与最佳匹配模型之差的欧几里得范数。然后使用时间序列的符号表示来比较/匹配一个新的时间序列。BMU 是标量值还是浮点数向量?很难想象 SOM 正在做什么。

Matlab 实现https://www.mathworks.com/matlabcentral/fileexchange/39930-self-organizing-map-simple-demonstration

我无法理解如何在量化中处理时间序列。假设N = 1,从白噪声过程中获得的元素的一维数组/向量,我如何使用自组织图量化/分割这些数据?

http://www.mathworks.com/help/nnet/ug/cluster-with-self-organizing-map-neural-network.html

由 Matlab 提供,但它适用于 N 维数据,但我有一个包含 1000 个数据点(t = 1,...,1000)的一维数据。

如果提供一个玩具示例来解释如何将时间序列量化为多个级别,那将是非常有帮助的。让,trainingData = x_i;

T = 1000;
N = 1;
x_i = rand(T,N)  ;

如何应用 SOM 下面的代码,以便数值数据可以用 1、2、3 等符号表示,即使用 3 个符号进行聚类?数据点(标量值)可以用符号 1 或 2 或 3 表示。

function som = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, etadecay, sgm0, sgmdecay, showMode)
%SOMSimple Simple demonstration of a Self-Organizing Map that was proposed by Kohonen.
%   sommap = SOMSimple(nfeatures, ndim, nepochs, ntrainingvectors, eta0, neta, sgm0, nsgm, showMode) 
%   trains a self-organizing map with the following parameters
%       nfeatures        - dimension size of the training feature vectors
%       ndim             - width of a square SOM map
%       nepochs          - number of epochs used for training
%       ntrainingvectors - number of training vectors that are randomly generated
%       eta0             - initial learning rate
%       etadecay         - exponential decay rate of the learning rate
%       sgm0             - initial variance of a Gaussian function that
%                          is used to determine the neighbours of the best 
%                          matching unit (BMU)
%       sgmdecay         - exponential decay rate of the Gaussian variance 
%       showMode         - 0: do not show output, 
%                          1: show the initially randomly generated SOM map 
%                             and the trained SOM map,
%                          2: show the trained SOM map after each update
%
%   For example: A demonstration of an SOM map that is trained by RGB values
%           
%       som = SOMSimple(1,60,10,100,0.1,0.05,20,0.05,2);
%       % It uses:
%       %   1    : dimensions for training vectors
%       %   60x60: neurons
%       %   10   : epochs
%       %   100  : training vectors
%       %   0.1  : initial learning rate
%       %   0.05 : exponential decay rate of the learning rate
%       %   20   : initial Gaussian variance
%       %   0.05 : exponential decay rate of the Gaussian variance
%       %   2    : Display the som map after every update

nrows = ndim;
ncols = ndim;
nfeatures = 1;
som = rand(nrows,ncols,nfeatures);


% Generate random training data
    x_i = trainingData;

% Generate coordinate system
[x y] = meshgrid(1:ncols,1:nrows);

for t = 1:nepochs    
    % Compute the learning rate for the current epoch
    eta = eta0 * exp(-t*etadecay);        

    % Compute the variance of the Gaussian (Neighbourhood) function for the ucrrent epoch
    sgm = sgm0 * exp(-t*sgmdecay);

    % Consider the width of the Gaussian function as 3 sigma
    width = ceil(sgm*3);        

    for ntraining = 1:ntrainingvectors
        % Get current training vector
        trainingVector = trainingData(ntraining,:);

        % Compute the Euclidean distance between the training vector and
        % each neuron in the SOM map
        dist = getEuclideanDistance(trainingVector, som, nrows, ncols, nfeatures);

        % Find the best matching unit (bmu)
        [~, bmuindex] = min(dist);

        % transform the bmu index into 2D
        [bmurow bmucol] = ind2sub([nrows ncols],bmuindex);        

        % Generate a Gaussian function centered on the location of the bmu
        g = exp(-(((x - bmucol).^2) + ((y - bmurow).^2)) / (2*sgm*sgm));

        % Determine the boundary of the local neighbourhood
        fromrow = max(1,bmurow - width);
        torow   = min(bmurow + width,nrows);
        fromcol = max(1,bmucol - width);
        tocol   = min(bmucol + width,ncols);

        % Get the neighbouring neurons and determine the size of the neighbourhood
        neighbourNeurons = som(fromrow:torow,fromcol:tocol,:);
        sz = size(neighbourNeurons);

        % Transform the training vector and the Gaussian function into 
        % multi-dimensional to facilitate the computation of the neuron weights update
        T = reshape(repmat(trainingVector,sz(1)*sz(2),1),sz(1),sz(2),nfeatures);                   
        G = repmat(g(fromrow:torow,fromcol:tocol),[1 1 nfeatures]);

        % Update the weights of the neurons that are in the neighbourhood of the bmu
        neighbourNeurons = neighbourNeurons + eta .* G .* (T - neighbourNeurons);

        % Put the new weights of the BMU neighbouring neurons back to the
        % entire SOM map
        som(fromrow:torow,fromcol:tocol,:) = neighbourNeurons;


    end
end


function ed = getEuclideanDistance(trainingVector, sommap, nrows, ncols, nfeatures)

% Transform the 3D representation of neurons into 2D
neuronList = reshape(sommap,nrows*ncols,nfeatures);               

% Initialize Euclidean Distance
ed = 0;
for n = 1:size(neuronList,2)
    ed = ed + (trainingVector(n)-neuronList(:,n)).^2;
end
ed = sqrt(ed);
4

1 回答 1

2

我不知道我可能会误解您的问题,但据我了解,无论是kmeans使用 Matlab 自己的selforgmap. 您为 SOMSimple 发布的实现我无法真正评论。

让我们以您的初始示例为例:

rng(1337);
T = 1000;
x_i = rand(1,T); %rowvector for convenience

假设您想量化为三个符号,您的手动版本可能是:

nsyms = 3;
symsthresh = [1:-1/nsyms:1/nsyms];
x_i_q = zeros(size(x_i));

for i=1:nsyms
    x_i_q(x_i<=symsthresh(i)) = i;
end

使用 Matlab 自己的selforgmap可以达到类似的结果:

net = selforgmap(nsyms);
net.trainParam.showWindow = false;
net = train(net,x_i);
net(x_i);
y = net(x_i);
classes = vec2ind(y);

最后,同样可以直接使用kmeans

clusters = kmeans(x_i',nsyms)';
于 2016-12-08T12:02:15.647 回答