0

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:

20090312    30:14.0 717.25  1   E
20090312    30:15.0 718.47  1   E
20090312    30:17.0 717.25  1   E
20090312    30:32.0 718.42  1   E
20090312    30:49.0 715.32  1   E
20090312    30:58.0 717.57  1   E
20090312    31:06.0 716.65  3   E
20090312    31:12.0 718.35  2   E
20090312    31:45.0 721.14  1   E
20090312    31:52.0 719.24  1   E
20090312    32:11.0 717.02  6   E
20090312    32:29.0 717.14  1   E
20090312    32:35.0 717.34  1   E
20090312    32:55.0 717.26  1   E

(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.

I read the file using the following code:

fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r'); 
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')

Which outputs:

20090312
      30
      14
       0
  717.25
       1
      69
20090312
      30
      15
       0
  718.47
       3
      69
       .
       .
       .

(the 69s are the matlab representation for E I believe)

Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.

My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.

c(2) is the first minute, so I said:

d(1)=c(2);

and then noting that I'd always be counting by 7 before getting to the next minute, I said:

Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
 if mod(i-2,7)== 0; 
     d(end+1)=c(i);
 end
end

which should fill up d with all the minutes:

30
30
30
30
30
30
31
31
31
31
32
32
32
32

in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.

4

1 回答 1

1

从您所在的位置:

Minutes = c(2:7:end);    
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
    Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
    Minutes=Minutes(1:length(Prices));

OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
    Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end

Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
    Highs(v) = max(Prices(Minutes==MinuteValues(v)));
    Lows(v) = min(Prices(Minutes==MinuteValues(v)));
    First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
    Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end

如前所述,使用 textread 会让您更轻松。

(如果你在这个阶段迷路了,我不会发现评论中提到的 accumarray 是最好的起点!)

顺便说一句,这是假设分钟增加到 60 分钟以上,而您在某处没有小时数。否则这根本行不通。

于 2013-06-14T16:34:05.587 回答