0

应用此方法时:

%% When an outlier is considered to be more than three standard deviations away from the mean, use the following syntax to determine the number of outliers in each column of the count matrix:

mu = mean(data)
sigma = std(data)
[n,p] = size(data);
% Create a matrix of mean values by replicating the mu vector for n rows
MeanMat = repmat(mu,n,1);
% Create a matrix of standard deviation values by replicating the sigma vector for n rows
SigmaMat = repmat(sigma,n,1);
% Create a matrix of zeros and ones, where ones indicate the location of outliers
outliers = abs(data - MeanMat) > 3*SigmaMat;
% Calculate the number of outliers in each column
nout = sum(outliers) 
% To remove an entire row of data containing the outlier
data(any(outliers,2),:) = []; %% this line

最后一行从我的数据集中删除了一定数量的观察(行)。然而,我后来在我的程序中遇到了一个问题,因为我手动将观察数(行数)声明为 1000。

%% generate sample data
K = 6;
numObservarations = 1000;
dimensions = 3;

如果我更改numObservarations为,data我会收到一个标量输出错误,但是如果我不更改它,由于行数不匹配,我会收到此错误:

??? Error using ==> minus
Matrix dimensions must agree.

Error in ==> datamining at 106
    D(:,k) = sum( ((data -
    repmat(clusters(k,:),numObservarations,1)).^2), 2);

有没有办法设置numObservarations它自动检测行data数并将其输出为一个数字?

4

1 回答 1

5

我一定是误会了什么。据我所知,这应该足够了:

numObservations = size(data, 1);
于 2012-07-12T15:52:34.717 回答