1

我继承了一个设计糟糕的表格,其中数据存储如下:

Period |  Identifier |   Value
----------------------------------
1      | AB1         | some number
1      | AB2         | some number
1      | AB3         | some number
1      | AB4         | some number
1      | AB5         | some number
1      | A1          | some number
1      | A2          | some number
1      | A3          | some number
1      | A4          | some number
1      | A5          | some number
2      | AB1         | some number
2      | AB2         | some number
2      | AB3         | some number
2      | AB4         | some number
2      | AB5         | some number
2      | A1          | some number
2      | A2          | some number
2      | A3          | some number
2      | A4          | some number
2      | A5          | some number

我正在尝试使用 SELECT 语句将数据转换为这种格式:

Row # | First value | Second value
1     | A1's number | AB1's number     // The next 5 rows are data from period 1
2     | A2's number | AB2's number
3     | A3's number | AB3's number
4     | A4's number | AB4's number
5     | A5's number | AB5's number
6     | A1's number | AB1's number     // These 5 rows are from period 2
7     | A2's number | AB2's number
8     | A3's number | AB3's number
9     | A4's number | AB4's number
10    | A5's number | AB5's number

AB%并且A%是该格式的两个单独的ID WHERE LIKE ...,我认为这会稍微挫败条款。我不完全确定可以将数据强制转换为所需的格式,但我的主管要求我进行调查。

我不知道 SQL 代码的最初尝试是查看行号本身并使用,但正如我所说,我不确定如何沿着这条路线前进。

目前,数据在 SQL Server 中,但将使用proc sql. 我认为这些标准在很大程度上符合 SQL Server,即使DECLARE不受支持。

不,我不知道以这种方式存储数据的是谁的想法......

4

3 回答 3

2

如果您使用的是 SAS,那么我只会使用 PROC TRANSPOSE。获取数据以包含标签变量,该变量确定数据将移动到哪个变量:

data datatable;
infile datalines dlm='|';
input
Period Identifier $ Value $;
datalines;
1      | AB1         | some number
1      | AB2         | some number
1      | AB3         | some number
1      | AB4         | some number
1      | AB5         | some number
1      | A1          | some number
1      | A2          | some number
1      | A3          | some number
1      | A4          | some number
1      | A5          | some number
2      | AB1         | some number
2      | AB2         | some number
2      | AB3         | some number
2      | AB4         | some number
2      | AB5         | some number
2      | A1          | some number
2      | A2          | some number
2      | A3          | some number
2      | A4          | some number
2      | A5          | some number
;;;
run;

data have;
set datatable;
idlabel = compress(identifier, ,'d');
byval = compress(identifier,,'kd');
run;

proc sort data=have;
by period byval;
run;
proc transpose data=have out=want;
by period byval;
id idlabel;
var value;
run;

如果出于某种原因您必须在 SQL 中执行此操作,则最好将其作为自身的连接来执行。您想为 AB 和 A 加入 period=1 和 compress(identifier,,'kd')=1 的行,因此您可以这样做:

proc sql;
  create table want as 
    select A.period, AB.value as AB, A.value as A
    from (select * from have where compress(identifier,,'d')='AB') AB, 
         (select * from have where compress(identifier,,'d')='A') A
    where AB.period=A.period
    and compress(AB.identifier,,'kd') = compress(A.identifier,,'kd');
quit;

但我认为 PROC TRANSPOSE 选项可能比自连接更有效(如果您的数据不如您显示的那么漂亮,则更灵活)。

于 2012-10-25T21:45:01.493 回答
2

如果标识符中的“B”用于区分类型 A 和类型 AB 标识符,那么您可以简单地删除该字母并加入结果:

SELECT ROW_NUMBER() OVER(ORDER BY AData.Period, AData.[Identifier]) AS [Row #]
    , AData.[Identifier] AS [First Value]
    , ABData.[Identifier] AS [Second Value]
FROM YourTable AData
-- Change to a LEFT JOIN if not all A's have AB's.
JOIN YourTable ABData
    -- NOTE: Assumes that 'B' is the only differentiator between
    -- AData and ABData's Identifier column and that it is
    -- not repeated as part of the common identifier.
    ON AData.[Identifier] = REPLACE(ABData.[Identifier], 'B', '')

你是绝对正确的——它不是一个非常好的模式——这可能需要一个全表扫描。

于 2012-10-25T21:42:06.257 回答
2

忽略在特定时期内将 A 与 AB 关联一秒钟的技巧,如果数据能够以某种方式关联,我将通过在表上对其自身进行内部连接来选择您正在寻找的格式,因此:

SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier), 
       a.Value, 
       b.Value 
FROM TableName a 
  INNER JOIN TableName b ON join_mechanism 
ORDER BY a.Period, a.Identifier, b.Identifier

现在,要填写连接机制,很明显的部分是有 a.Period = b.Period。有问题的部分是如果此文本是静态的,您可能会尝试字符串替换。所以 REPLACE(a.Identifier, 'A', 'AB') = b.Identifier.

因此,总而言之,您将拥有:

SELECT row_number() OVER(ORDER BY a.Period, a.Identifier, b.Identifier), 
       a.Value, 
       b.Value 
FROM TableName a 
  INNER JOIN TableName b ON a.Period = b.Period AND REPLACE(a.Identifier, 'A', 'AB') = b.Identifier 
ORDER BY a.Period, a.Identifier, b.Identifier

注意:SELECT 语句尚未经过测试,我假设您使用的是支持 row_number 的相对较新版本的 MSSQL。

于 2012-10-25T21:55:07.933 回答