1

I am new to SAS and have this basic problem. I have a list of NYSE trading dates in table A as follows -

trading_date
1st March 2012
2nd March 2012
3rd March 2012
4th March 2012
5th March 2012
6th March 2012

I have another table B that has share price information as -

Date          ID    Ret Price
1st March 2012  1   …   …
3rd March 2012  1   …   …
4th March 2012  1   …   …
5th March 2012  1   …   …
6th March 2012  1   …   …
1st March 2012  2   …   …
3rd March 2012  2   …   …
4th March 2012  2   …   …

... has numeric data related to price and returns.

Now I need to join the NYSE Data table to the above table to get the following table -

Date         ID    Ret  Price
1st March 2012  1   …   …
2nd March 2012  1   0   0
3rd March 2012  1   …   …
4th March 2012  1   …   …
5th March 2012  1   …   …
6th March 2012  1   …   …
1st March 2012  2   …   …
2nd March 2012  2   0   0
3rd March 2012  2   …   …
4th March 2012  2   …   …

i.e. a simple left join. The zero's will be filled with . in SAS to indicate missing values, but you get the idea. But if I use the following command -

proc sql;
create table joined as
select table_a.trading_date, table_b.* from table_a LEFT OUTER join table_b on table_a.trading_date=table_b.date;
quit;

The join happens only for the first ID (i.e. ID=1) while for the rest of the IDs, the same data is maintained. But I need to insert the trade dates for all IDs.

How can get the final data without running a do while loop for all IDs? I have 1000 IDs and looping and joining 1000 times is not an option due to limited memory.

4

3 回答 3

5

Joe 是对的,您还需要考虑 ID,但使用他的解决方案您无法获得2nd March 2012,因为那天没有人交易。您只需sql一步即可完成所有操作(这需要更长的时间):

proc sql;
   create table final as
   select d.trading_date, d.ID, t.Price, t.Ret
   from
   (
      select trading_date, ID 
      from table_a, (select distinct ID from table_b) 
   ) d
   left join
   (
      select *
      from table_b
   ) t
   on t.Date=d.trading_date and t.ID=d.ID
   order by d.id, d.trading_date;
quit;
于 2014-05-23T17:45:02.583 回答
1

您的左连接不起作用,因为它没有考虑 ID。SAS(或者更确切地说是 SQL)不知道它应该按 ID 重复。

获得完整组合的最简单方法是 PROC FREQ 和 SPARSE,假设有人在每个有效交易日都有交易。

proc freq data=table_b noprint;
tables id*trading_date/sparse out=table_all(keep=id trading_date);
run;

然后通过 id 和 date 将其连接到原始 table_b。

或者,您可以使用 PROC MEANS,它可以获取您的数字(它不能以这种方式获取字符,除非您可以将它们用作类值)。

使用 Anton 创建的 table_b (带有retprice变量):

proc means data=table_b noprint completetypes nway;
class id trading_date;
var ret price;
output out=table_allmeans sum=;
run;

这将输出缺失行的缺失值和当前行的值,并且将有一个_FREQ_变量允许您区分交易数据集中是否真的存在行。

于 2014-05-14T16:12:25.170 回答
-1

我想数据一定有问题,因为您的查询看起来很好,并且可以按照您描述的方式处理我生成的测试数据:

data table_a;
    format trading_date date9.;
    do trading_date= "01MAR2012"d to "06MAR2012"d;
        output;
    end;
run;

data table_b;
    format date date9.;
    ret = 0;
    price = 0;
    do date= "01MAR2012"d to "06MAR2012"d;
        do ID = 1 to 4;
            if ranuni(123) < 0.3 then
                output;
        end;
    end;
run;

以下是运行您逐字复制的查询后得到的结果:

trading_date date ret price ID 
01MAR2012 01MAR2012 0 0 3 
02MAR2012 02MAR2012 0 0 2 
03MAR2012 03MAR2012 0 0 1 
03MAR2012 03MAR2012 0 0 2 
04MAR2012 04MAR2012 0 0 2 
05MAR2012 05MAR2012 0 0 3 
06MAR2012 . . . . 

值得检查日期的格式——它们是数字的吗?如果它们是字符,它们的格式是否相同?如果它们是数字,它们是应用了某种奇怪格式的日期或日期时间吗?

于 2014-05-14T11:51:23.977 回答