matlab - randperm 或同时打乱两个数据集

Question

我在这里有一个先前的问题，我认为我遇到了一个问题，因为我不能同时洗牌样本数据和样本数据的索引（idx）。

我有一个名为fulldata的数据集，其中包含 49,000 行 x 6 列，然后我有另一个数据集，它是 fulldata (Book2) 的类标签，其中包含许多类标签，它们对应于 fulldata 中的确切行。

我只想从 fulldata（正常。和 smurf。）中选择两个类标签，我想要 750 个正常行和 250 个 smurf 行。

然后我想随机打乱新的样本数据（1000x6）。

到此为止，已被管理。但是后来我被卡住了... Dan 帮助解决了上一个问题，但后来我注意到 k1 输出了 Book2 中的一些其他类标签，例如海王星等，但事实并非如此... K1 应该只包含 smurf 和 normal 类标签。

我问这个的原因是因为我想在matlab中使用baysian分类器，为了使用它我需要：

Test_Data (unseen data)
Trainning_data (This is the sample data im trying to create above)
Target_class (this is the class labels that match exactly each row in sample data)

score 2 · Accepted Answer

我想你想要

idx = [smurfIdx(a);normIdx(p)];

然后确保使用上一个问题中@Dan 的建议，即

shuffle = randperm(1000);
sample = sample(shuffle,:);
K1 = Book2(idx (shuffle), :);

score 1 · Accepted Answer

这个问题的措辞有些含糊，所以不清楚你到底卡在哪里。但是，我冒昧地查看了您以前的问题（this和this），所以这是我解决您问题的尝试：

出于此答案的目的，让我们首先生成一个随机数据集，类似于您的：

classes = {'normal.', 'smurf.', 'neptune.', 'eject.', 'portsweep.'};
fulldata = ceil(1e3 * rand(49000, 6));
Book2 = {classes{ceil(numel(classes) * rand(size(fulldata, 1), 1))}}';

让我们随机选择 750 行对应“普通”标签和 250 行对应“蓝精灵”标签。但是，与其应用randperm数据本身并选择前 N 个值（就像您在之前的问题中所做的那样），不如创建一个随机索引向量并使用它来索引fulldata和Book2数组，如下所示：

idxnormal = strmatch('normal.', Book2);            % # Find normals
idxnormal = idxnormal(randperm(numel(idxnormal))); % # Random shuffle of normals
idxsmurf = strmatch('smurf.', Book2);              % # Find smurfs
idxsmurf = idxsmurf(randperm(numel(idxsmurf)));    % # Random shuffle of smurfs
idx = [idxnormal(1:750); idxsmurf(1:250)];         % # 750 normals and 250 smurfs
idx = idx(randperm(numel(idx)));                   % # Random shuffle

idx现在将随机索引保存到fulldata/'Book2' 中，这些索引仅对应于“正常”或“蓝精灵”标签。现在让我们检索具有相应标签的数据子集：

subsetdata = fulldata(idx, :);
K1 = Book2(idx);

score 0 · Accepted Answer

好的，在阅读了前面的问题后，我希望已经理解了这个问题。如果我是对的，您只是忘记了先过滤数据并仅提取 smurf 和 normal。

在这种情况下，您应该查看逻辑索引：http ://www.mathworks.nl/company/newsletters/articles/Matrix-Indexing-in-MATLAB/matrix.html;jsessionid=97fa707e5059807b7ecae8969810

在绘制数据点之前使用它来提取正确的子集，你应该没问题。

matlab - randperm 或同时打乱两个数据集

3 回答 3

Related

Reference