-1

第一个表包含键值和时间,如下所示

表:Time_Stamp

第二个表包含每个 ID 都有其开始和结束日期。

表:Time_Table

我想从 time_stamp 中找出每一行的 ID。

预期结果

有固定数量的类别。但是有很多ID。

你能帮我写一个 SQL 查询吗?(任何 SQL 样式都可以。我可以转换它。SAS 兼容的 PROC SQL 会更好)

4

2 回答 2

1

如果您在 SAS 中执行此操作,则最好使用格式。格式具有采用开始/结束范围的优势,并且非常快 - 如果我没记错的话,大约是 o(1) 时间。这不需要对较大的数据集进行排序(如果这是一个问题,甚至可以避免对较小的数据集进行排序),大多数 SQL 解决方案可能会这样做,除非它们可以将较小的数据集保存在内存中(作为哈希表)。

前两个数据步骤只是在上面创建您的数据,format_two数据步骤是第一个执行任何新操作的步骤。

如果有更多类别,只要它们是字母而不是数字,这仍然可以正常工作;您要更改的唯一区别是if _n_ le 22 应该相等(类别总数)。

data time_Stamp;   *Making up the test dataset;
  category='A';
  do value=1 to 6;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-1);
    output;
  end;
  category='B';
  do value = 7 to 12;
    time = intnx('HOUR','01NOV2014:00:00:00'dt,value-4);
    output;
  end;
run;

data time_table;    *Making up the ID dataset;
  informat start_time end_time datetime18.;
  input id category $ start_time end_time;
  datalines;
  1 A 01NOV2014:00:00:00 01NOV2014:03:00:00
  1 B 01NOV2014:00:03:00 01NOV2014:06:00:00
  2 A 01NOV2014:03:00:00 01NOV2014:06:00:00
  2 B 01NOV2014:06:00:00 01NOV2014:09:00:00
  ;
quit;


*This restructures time_table into the needed structure for a format lookup dataset;
data format_two;
  set time_table;
  fmtname=cats('KEYFMT',category);   *This is how we handle A/B - different formats.  If it were numeric would need to end with 'F'.;
  start=start_time;
  end=end_time;
  label=id;
  eexcl='Y';         *This makes it exclusive of the end value, so 03:00 goes with the latter ID and not the former.;
  hlo=' ';
  output;
  if _n_ le 2 then do;  *This allows it to return missing if the ID is not found. ;
                        *le 2 is because we want one for each category - if more categories, needs to be hifgher;
    hlo='o';
    label=' ';
    call missing(of start end);
    output;
  end;
run;


*Have to sort to group formats together, but at least this is the small dataset;
*If even this is a time concern, this could be done differently (make 2 different datasets above);
proc sort data=format_two;
  by fmtname;
run;

*Import the format lookups;
proc format cntlin=format_two;
quit;

*Apply using PUTN which allows specifying a format at runtime;
data table_one_ids;
  set time_stamp;
  id = putn(time,cats('KEYFMT',category));
run;
于 2014-11-30T14:18:34.793 回答
0
SELECT        Time_stamp.Category, Time_stamp.Time, Time_stamp.Value, Time_Table.ID
FROM            Time_stamp INNER JOIN
                         Time_Table 
ON Time_stamp.Category = Time_Table.Category 
  AND Time_stamp.Time BETWEEN Time_Table.Start_time AND DATEADD(SS,-1,Time_Table.End_time)
ORDER BY Time_stamp.Category,TIME
于 2014-11-30T07:01:39.587 回答