1

I have a 312x2 matrix representing number of questions answered (column 2) by individual subjects (column 1) e.g.

x =

    40    56
    41    56
    42   176
    43   176
    44   116
    45    56
    46    56
    47   116
    48    56
    49    56
    50   116

Some participants answered the questionnaire more than once, and I would like to divide their data into additional columns.

I.e. participant 43 completed it 3 times - 56 questions once, 60 questions twice.

Therefore, I'd like to split their data into 3 columns, to end up with:

x = 
    40    56
    41    56
    42    56    60    60
    43    56    60    60
    44    56    60
    ...etc

I'll then fill in the gaps with NaN so i can work out the mean questions answered per questionnaire.

4

4 回答 4

2

这不是最优雅的解决方案,但它可以相对简单地完成工作:

x = [40 56;41 56;42 176;43 176;44 116;45 56;46 56;47 116;48 56;49 56;50 116];
a=x(:,2);
newData=[];
for i=1:size(a,1)
    if a(i)==56
        newData=vertcat(newData,[56 NaN NaN]);
    elseif a(i)==116
        newData=vertcat(newData,[56 60 NaN]);
    elseif a(i)==176
        newData=vertcat(newData,[56 60 60]);
    end
end
Data = horzcat(x(:,1),newData)

命令窗口:

Data =

40    56   NaN   NaN
41    56   NaN   NaN
42    56    60    60
43    56    60    60
44    56    60   NaN
45    56   NaN   NaN
46    56   NaN   NaN
47    56    60   NaN
48    56   NaN   NaN
49    56   NaN   NaN
50    56    60   NaN
于 2013-04-17T12:04:05.380 回答
1

最好使用元胞数组来存储最终输出,因为每一行可能包含不同数量的元素。

这是一个简短的解决方案,可为您提供所需的结果:

C = mat2cell(x, ones(1, size(x, 1)), 2);
C(ismember(x(:, 1), cellfun(@(z)z(1), y))) = y;

wherex是问题中描述的原始数据数组,并且y是一个单元格数组,其中包含要替换的新行(如您所说的split) in x

例子

这是一个类似于问题中给出的简短示例:

%// Generate sample data
x = [40 56; 41 56; 42 176; 43 176; 44 116; 45 56];
y = {[43 56 60 60]; [44 56 60]};

%// Replace rows in x with matching rows in y
C = mat2cell(x, ones(1, size(x, 1)), 2);
C(ismember(x(:, 1), cellfun(@(z)z(1), y))) = y;

结果是一个新的元胞数组:

C =
    [40    56]
    [41    56]
    [42   176]
    [43    56    60    60]
    [44    56    60]
    [45    56]

要计算每个问卷回答的平均问题,请使用cellfun遍历单元格:

m = cellfun(@(x)mean(x(2:end)), C)

对于这个例子产生:

m =
   56.0000
   56.0000
  176.0000
   58.6667
   58.0000
   56.0000
于 2013-04-17T12:02:05.137 回答
0

当您分配超出当前矩阵范围的内容时,MATLAB 会自动为您展开矩阵:

>> a = [
    40    56
    41    56
    42   176
    43   176
    44   116
    45    56
    46    56
    47   116
    48    56
    49    56
    50   116];     
>> a(3,3) = 60
a =
    40    56     0
    41    56     0
    42   176    60
    43   176     0
    44   116     0
    45    56     0
    46    56     0
    47   116     0
    48    56     0
    49    56     0
    50   116     0

默认操作是用零(而不是NaNs)填充。如果您可以合理地预期永远不会有任何参与者回答了 0 个问题,您可以这样做

>> a(a==0) = NaN;
>> M = nanmean(a(:,2:end),2);

得到平均值。

如果你不能指望这一点(或者你没有/不想依赖统计工具箱来使用nanmean),那么你可以编写一些小函数来做你想做的事:

function new_a = addQsAnswered(old_a, QsAnswered)

    new_a = old_a;

    for ii = 1:size(QsAnswered,1)

        inds = (new_a(:,1) == QsAnswered(ii,1));

        if sum(isnan(new_a(inds,:))) < size(QsAnswered(:,2:end),2)
            new_a = [new_a NaN(size(new_a,1),1)]; %#ok            
        end

        new_a(inds,isnan(new_a(inds,:))) = QsAnswered(ii,2:end);        

    end

end

function M = getParticipantMeans(a)   
    M = zeros(size(a,1),1);    
    for ii = 1:size(a,1)
        as = a(ii,2:end);
        as = as(~isnan(as));
        M(ii) = sum(as)/numel(as);
    end    
end

例子:

>> addQsAnswered(a, [46 78 90; 49 60 78])
ans =
        40    56   NaN   NaN
        41    56   NaN   NaN
        42   176    60   NaN
        43   176   NaN   NaN
        44   116   NaN   NaN
        45    56   NaN   NaN
        46    56    78    90
        47   116   NaN   NaN
        48    56   NaN   NaN
        49    56    60    78
        50   116   NaN   NaN

>> getParticipantMeans(new_a)
ans =
        5.600000000000000e+001
        5.600000000000000e+001
        1.180000000000000e+002
        1.760000000000000e+002
        1.160000000000000e+002
        5.600000000000000e+001
        7.466666666666667e+001
        1.160000000000000e+002
        5.600000000000000e+001
        6.466666666666667e+001
        1.160000000000000e+002
于 2013-04-17T12:12:13.200 回答
0

以下将实现您所需要的:

B = unique(A(:,1));
m = max(hist(A(:,1),B));
B = [B, nan(numel(B),m)];
for ii=1:size(B,1)
    jj = (A==B(ii,1));
    B(ii,2:end) = [A(jj(:,1),2)', nan(1,m-sum(sum(jj(:,1))))];
end

对于输入:

A =

    53    83
    84    76
    52    99
    53    83
    76    90
    54    73
    91    72
    91    92
    86    54
    57    56

结果将是:

B =

    52    99   NaN
    53    83    83
    54    73   NaN
    57    56   NaN
    76    90   NaN
    84    76   NaN
    86    54   NaN
    91    72    92
于 2013-04-17T12:03:21.640 回答