matlab - 5 个独立分布的拉丁超立方抽样

Question

我有 5 个变量A、V、h、l和b，它们都来自不同的分布。我想通过拉丁超立方体采样的方法从每个分布中制作 1,000 个均匀分布的样本。这是一个现实的要求，即它真的比简单的随机抽样更好吗？您对我如何在 matlab 中执行此操作有任何参考吗？这个页面建议我需要以某种方式转换样本......

score 5 · Accepted Answer

更新#2：使用统计工具箱内置功能的解决方案

基本问题是您是否希望您的样本在常规网格上。如果没有，您可以使用内置函数lhsdesign：

p = 1000   % Number of points
N = 5   % Number of dimensions
lb = [1 1 1 1 1]; % lower bounds for A,V,h,l and b
ub = [10 10 10 10 10]; % upper bounds for A,V,h,l and b
X = lhsdesign(p,N,'criterion','correlation');
D = bsxfun(@plus,lb,bsxfun(@times,X,(ub-lb)));

'criterion','correlation'会给你想要的“平均分配”。

D然后包含参数的不规则坐标分布。

首先，我以为您正在寻找常规网格上的样本，这似乎确实是一项艰巨的任务。我试图修改上面的方法D = round(bsxfun...)，但它不会给你满意的结果。所以对于这种情况，我仍然在这里提供我的初步想法：

以下解决方案远非快速和优雅，但它至少是一个解决方案。

% For at least 1000 samples M=6 divisions are necessary
M = 6;
N = 5;
% the perfect LHC distribution would have 1296 samples for M=6 divisions
% and 5 dimensions
counter_max = M^(N-1); %=1296

% pre-allocation
D = zeros(6,6,6,6,6);
counter = 0;

while counter < 1000
    c = randi(6,1,5);
    if ( sum( D( c(1) , c(2) , c(3) , c(4) ,  :    )) < 1 && ...
         sum( D( c(1) , c(2) , c(3) ,   :  , c(5)  )) < 1 && ...   
         sum( D( c(1) , c(2) ,   :  , c(4) , c(5)  )) < 1 && ...   
         sum( D( c(1) ,   :  , c(3) , c(4) , c(5)  )) < 1 && ...   
         sum( D(   :  , c(2) , c(3) , c(4) , c(5)  )) < 1  )
     
     D(c(1),c(2),c(3),c(4),c(5)) = 1;
              
     X(counter,:) = c;
     counter = counter+1;

    end
end

X最终包含所有样本的坐标。

如您所见，我使用了带有底层 if 条件的 while 循环。您希望 1000 个样本，这是一个现实的数字，可以在合理的时间内完成。实际上，我建议您使用尽可能接近最大值 1296 的样本数。这可能会花费您相当长的时间。但是，当您只创建一次生成的矩阵并一次又一次地使用它时，请不要犹豫 24 小时运行它。您还可以按照此处所述实现中断代码：在 MatLab 中，是否可以终止脚本，但将其所有内部变量保存到工作区？看看在那之前你得到了多少样本。（我在测试时在 20 分钟内获得了 900 个样本）

更新：显示方法限制的示例：

以下示例将说明提问者可能愿意做什么以及结果实际上应该是什么样子。因为我也对一个好的解决方案非常感兴趣，所以我的解决方案有限，无法提供“100% 的结果”。

想象一个有分割的立方体 ( N=3) 。M=10

M = 10;
N = 3;
counter_max = M^(N-1);   %=100 maximum number of placeable samples

% pre-allocations
D = zeros(10,10,10);
counter = 0;

while counter < counter_max
    c = randi(10,1,3);
    % if condition checks if there is already a sample in the same row,
    % coloumn or z-coordinate, 
    if ( sum( D( c(1) , c(2) ,   :  )) < 1 && ...   
         sum( D( c(1) ,   :  , c(3) )) < 1 && ...   
         sum( D(   :  , c(2) , c(3) )) < 1  )
       
       %if not a new sample is generated
       D(c(1),c(2),c(3)) = 1;
       counter = counter+1;
       X(counter,:) = c;

    end
end

在大约 10000 次迭代后，100 个可能放置的样本中有 85 个得到以下分布：在此处输入图像描述其中颜色表示到最近邻居的归一化距离。对于大多数点来说都很好 (1)，但由于有 15 个缺失样本，一些点与其他点的距离更远。

问题是：我怀疑是否有可能在合理的时间内获得所有 100 个样本。当将生成的样本绘制在迭代次数上时，您会得到：在此处输入图像描述

...所以想要的结果似乎很难获得。

请将此答案视为鼓励而不是解决方案。

score 1 · Accepted Answer

通过组合一维拉丁超立方体样本（LHS），您可以为高阶维度的规则网格制作一整套 LHS。例如，想象 3X3 LHS（即 2-D 和 3 个分区）。首先，您只需为常规网格制作一维 LHS。(1,0,0), (0, 1, 0), (0, 0, 1) 对于一维。然后，将 1-D LHS 组合成 2-D LHS。

1, 0, 0

0, 1, 0

0, 0, 1

或者

0, 1, 0

1, 0, 0

0, 0, 1

... ETC。

也可以使用相同的方法（通过组合 2-D LHS）创建 3-D 的 LHS。

3X3 有 12 种可能的 LHS。通常，可能的 LHS 的数量是 N x ((M-1)!)^(M-1)。（N=部门，M=维度）

以下代码显示了 3-D 和 10 格的 LHS。

此代码仅生成一个 LHS。

结果是随机的。

100% 结果需要 0.001288 秒

clear;
clc;

M = 3; % dimension
N = 10; % division

Sel2 = ':,';
stop = 0;

P_matrix = repmat([num2str(N),','],1,M);
P_matrix = P_matrix(1:end-1);
eval(['P = zeros(', P_matrix, ');']);

P(1,1) = 1;
tic

while stop == 0
for i = 1 : M-1
    for j = 2:N
        if i == 1
            P(end    , j, 1) = P(1    , j-1, 1);
            P(1:end-1, j, 1) = P(2:end, j-1, 1);
        else
            Sel_2 = repmat(Sel2,1,i-1);
            Sel_2 = Sel_2(1:end-1);
            eval(['P(', Sel_2, ',end    , j, 1)     = P(', Sel_2 , ', 1    , j-1, 1);']);
            eval(['P(', Sel_2, ',1:end-1    , j, 1) = P(', Sel_2 , ', 2:end, j-1, 1);']);
        end
    end
    if i == 1
        P(:,:,1) = P(randperm(N),:,1);
    elseif i <M-1
        Sel_2 = repmat(Sel2,1,i);
        Sel_2 = Sel_2(1:end-1);
        eval(['P(',Sel_2,',:,1) = P(',Sel_2,',randperm(N),1);']);
    else
        Sel_2 = repmat(Sel2,1,i);
        Sel_2 = Sel_2(1:end-1);
        eval(['P(',Sel_2,',:) = P(',Sel_2,',randperm(N));']);
    end
end

% you can add stop condition
stop = 1;
end
toc

[x, y, z] = ind2sub(size(P),find(P == 1));
scatter3(x,y,z);
xlabel('X');
ylabel('Y');
zlabel('Z');

结果

score 0 · Accepted Answer

此代码给出的结果与本讨论中接受的代码相同。看看这个！

n=1000; p=5; 
distr=lhsdesign(n,p); %creates LHS of n samples on each of your p dimensions 
% Then, you can choose any inverted distribution. in this case it is the Discrete Uniform Distribution
Parameters=[unidinv(distr(:,1),UB1) unidinv(distr(:,2),UB2) ...  
            unidinv(distr(:,3),UB3) unidinv(distr(:,4),UB4) ...
            unidinv(distr(:,5),UB5) ];
%At the end, you'll do a simple work of indexing.

matlab - 5 个独立分布的拉丁超立方抽样

3 回答 3

Related

Reference