matlab - 消除椭圆外的所有二维数据点

Question

我有来自流式细胞仪数据的 anx 2 阵列，代表细胞的前向散射和侧向散射（有 n 个细胞）。这些值代表细胞的物理特性，我希望过滤细胞。当绘制为散点图时，数据显示出强烈的椭圆云，然后有更多分散的细胞。我希望对这些数据进行“门控”，以便保留占主导地位的云并过滤掉所有其余部分（在下图中，我希望保留灰色椭圆边界内的点。在此处输入图像描述

我想要的是得到一个二进制 nx 1 数组，如果这个单元格在云中，索引 i 的值为 1，如果不是，则为 0。

我其实不知道如何过滤掉椭圆外的数据。但我尝试使用 K-means 指定 4 个集群。然而，主导集群被检测为单个组（见下图）。在此处输入图像描述我需要能够以编程方式检测主导集群。如果有人可以提供帮助，我将不胜感激。样本数据在这里 FS_SS.txt（托管在 AnonFiles.com）

score 3 · Accepted Answer

如果您有统计工具箱，请尝试以下操作：

a = dlmread('~\downloads\-data-anonfiles-1383150325725.txt'); % read data
p = mvnpdf(a,mean(a),cov(a)); % multivariate PDF of your data
p_sample = numel(p)*p/sum(p); % normalize pdf to number of samples
thresh = 0.5; % set an arbitrary threshold to filter
idx_thresh = p_sample > thresh; % logical indices of samples that meet the threshold
a_filtered = a(idx_thresh,:);

然后使用过滤后的数据再次重复此操作。

 p = mvnpdf(a,mean(a_filtered),cov(a_filtered));
 p_sample = numel(p)*p/sum(p); % normalize pdf to number of samples
 thresh = 0.1; % set an arbitrary threshold to filter
 idx_thresh = p_sample > thresh; % logical indices of samples that meet the threshold
 a_filtered = a_filtered (idx_thresh,:);

我能够在 2 次迭代中提取出大部分主要分布。但我认为你会想要重复直到 mean(a_filtered) 和 cov(a_filtered) 达到稳定状态值。将它们绘制为迭代的函数，当它们接近一条平线时，您就找到了正确的值。

这相当于使用旋转椭圆进行过滤，但 IMO 它更容易且更有用，因为现在您实际上拥有重现分布所需的 5 个mvnpdf参数（mu_x、mu_y、sigma_xx、sigma_yy、sigma_xy）。如果将等值线 (p(x,y) = thresh) 建模为旋转椭圆，则必须操纵短轴和长轴 (a,b)、平移坐标 (h,k) 和旋转 ( theta) 来获取 mvnpdf 参数。

然后在提取第一个分布后，您可以重复该过程以找到第二个分布。

matlab - 消除椭圆外的所有二维数据点

1 回答 1

Related

Reference