0

我正在尝试从矩阵中“按列分组”数据。

数据是从数据库中提取的,矩阵如下所示:

'2012-04-26'    'USD'    'BRL'    [    1.8894]
'2012-04-26'    'USD'    'IDR'    [      9185]
'2012-04-26'    'USD'    'INR'    [   52.5350]
'2012-04-26'    'USD'    'MXN'    [   13.2337]
'2012-04-26'    'USD'    'PEN'    [    2.6505]
'2012-04-26'    'USD'    'SGD'    [    1.2412]
'2012-04-26'    'USD'    'TRY'    [    1.7643]
'2012-04-27'    'USD'    'BRL'    [    1.8846]
'2012-04-27'    'USD'    'IDR'    [      9189]
'2012-04-27'    'USD'    'INR'    [   52.5600]
'2012-04-27'    'USD'    'MXN'    [   13.0147]
'2012-04-27'    'USD'    'PEN'    [    2.6395]
'2012-04-27'    'USD'    'SGD'    [    1.2385]
'2012-04-27'    'USD'    'TRY'    [    1.7600]

(这是一个单元阵列)

我想要做的是按日期(第一行)对所有数据进行分组,然后为每个值设置一列,如下所示:

'2012-04-26'    [    1.8894]    [      9185]    [   52.5350]    [   13.2337]    [    2.6505]    [    1.2412]    [    1.7643]    
'2012-04-27'    [    1.8846]    [      9189]    [   52.5600]    [   13.0147]    [    2.6395]    [    1.2385]    [    1.7600]

其中每一行代表一个货币对 (USD/BRL, USD/IDR, USD/INR, ...)

请注意,对于每个日期,它们在提取的数据中的行数(货币对)完全相同。

在 Matlab 中是否有一种优雅(且快速)的方法来实现这一点?

谢谢,

4

1 回答 1

2

鉴于您强调速度在问题中很重要,我提出以下解决方案:

%# Build an example cell array 
D = cell(6, 4);
for t = 1:3; D{t, 1} = '2012-04-26'; D{t, 2} = 'A'; D{t, 3} = 'A'; D{t, 4} = t; end;
for t = 4:6; D{t, 1} = '2012-04-27'; D{t, 2} = 'A'; D{t, 3} = 'A'; D{t, 4} = t; end;

%# My Solution
X = [datenum(D(:, 1), 'yyyy-mm-dd'), cell2mat(D(:, 4))];
[UniqueDate, ~, Index] = unique(X(:, 1));
NumObsPerDay = sum(Index == 1);
NumDay = length(UniqueDate);
Soln = [UniqueDate, reshape(X(:, 2), NumObsPerDay, NumDay)'];

在第一行中,我将重要的数据提取到一个数值数组中。数值数组比元胞数组操作起来要快得多,因为单个元素占用的内存要少得多。为了处理日期字符串,我在第一步将它们转换为 matlab数字日期格式。如果您打算大量使用 Matlab,我建议您熟悉数字日期格式,因为它比使用字符串更灵活 - 例如,您可以对数字日期格式执行任何类型的算术运算。

在第二行,我得到一个唯一的日期列表和一个索引。

在第三行和第四行中,我使用索引来获取您拥有数据的天数以及每天的观察次数。注意:这条线NumObsPerDay = sum(Index == 1);隐含地假设您每天有相同数量的观察值(即其他货币)。但是,您在问题中声明是这种情况,所以我相信您的话:-)

In the fifth line, I create a numerical matrix that has the format you desire. The first column is the unique date vector obtained in line 2, and I've obtained the remaining columns by reshaping the data in X. CAUTION: This line implicitly assumes that the ordering of the currencies in your cell array are identical for each day. Again, I've made this assumption because it is true in your sample data and you stated you wanted a fast solution.

FINAL CAUTION: If either of the assumptions made above are violated then this code will fail, or your data will get mixed up. In other words, if you're certain that all your data conforms to the sample you provided, then this solution should serve, and should also be fast. But if you're not certain, then this is not a good solution for you.

ps if you want to see the dates in string format again, just use datestr(Soln(:, 1), 'yyyy-mm-dd');

于 2012-10-31T01:28:43.873 回答