我有包含异构数据的数据集。我想按某些列对组进行计数,而无需执行任何额外的操作,例如求和或平均。
然而,grpstats
函数需要我的数字数据字段:
>> grpstats(ds, {'Field1', 'Field2'}, {'numel'})
Error using dsgrpstats (line 256)
Data variables must numeric or logical.
Error in grpstats (line 135)
[varargout{1:nargout}] = dsgrpstats(x,group,whichstats,varargin{:});
如何克服?
更新
奇怪的是我还不能创建SSCCE!
小例子有效:
>> A={'Name', 'Gender'; 'Ann', 'female'; 'John', 'male'; 'Peter', 'male'}
B=cell2dataset(A,'ReadVarNames',true,'ReadObsNames',true)
grpstats(B,{'Gender'},{'numel'})
A =
'Name' 'Gender'
'Ann' 'female'
'John' 'male'
'Peter' 'male'
B =
Gender
Ann 'female'
John 'male'
Peter 'male'
ans =
Gender GroupCount
female 'female' 1
male 'male' 2
这就是我要的。但在我的例子中,我得到
Error using dsgrpstats (line 256)
Data variables must numeric or logical.
并且应该做以下技巧
>> B.Dummy=ones(size(B,1),1)
B =
Gender Dummy
Ann 'female' 1
John 'male' 1
Peter 'male' 1
>> grpstats(B,{'Gender'},{'numel'},'DataVars',{'Dummy'})
ans =
Gender GroupCount numel_Dummy
female 'female' 1 1
male 'male' 2 2
更新 2 (CCSE)
我找到了。如果数据集包含嵌套元胞数组,则会发生错误:
A={'Name', 'Gender', 'Measurements'
'Chanel Iman Robinson', 'female', {32, 23, 33}
'Wilhelmina Cooper', 'female', {38, 24, 36}
'Arnold Schwarzenegger', 'male', {57, 33, 29}};
B=cell2dataset(A,'ReadVarNames',true,'ReadObsNames',true)
grpstats(B, {'Gender'}, {'numel'})