如果您安装“STATS CARTPROD”扩展包,您可以执行此操作。使用此扩展,您可以创建笛卡尔积作为创建外连接的中间步骤。
从 SPSS 22 开始,您可以直接从程序菜单 Extra->Extension Bundles->Install and Download extension bundles 下载它。您也可以从此处手动下载并安装它:https ://www.ibm.com/developerworks/community/files/app?lang=en#/file/d0afcd4e-6d5d-4779-84ef-2b68bc81b861 请
注意,您必须拥有安装“Python Essentials for SPSS”以使其工作。
*** create the example data.
DATA LIST FREE / classnbr1 (F1) fact1 (A1).
BEGIN DATA
1 A
1 D
2 A
3 B
END DATA.
DATASET NAME data1.
DATA LIST FREE / classnbr2 (F1) fact2 (A2).
BEGIN DATA
1 XX
1 XY
3 ZZ
END DATA.
DATASET NAME data2.
在使用“STATS CARTPROD”扩展时,我在变量名中使用大写字母时遇到了问题。同样重要的是,“classnbr”在两个数据集中具有不同的变量名称。
*** create cartesian product using the STATS CARTPROD extension.
DATASET ACTIVATE data1.
STATS CARTPROD INPUT2=data2
VAR1=classnbr1 fact1 VAR2=classnbr2 fact2
/SAVE OUTFILE="C:\MY FOLDER\cardprod.sav" DSNAME = cart.
EXECUTE.
*** create an equi join.
SELECT IF classnbr1 = classnbr2.
EXECUTE.
DELETE VARIABLES classnbr2.
现在包括在 data2 中不匹配的案例。
*** create left outer join
* assuming both data sets are ordered by classnbr1 and fact1
ADD FILES
/FILE = cart
/FILE = data1
/BY classnbr1 fact1.
EXECUTE.
DATASET NAME outer_join.
DATASET ACTIVATE outer_join.
COMPUTE select=1.
IF (length(fact2)=0 AND classnbr1=LAG(classnbr1) AND fact1=LAG(fact1)) select=0.
EXECUTE.
SELECT IF select = 1.
EXECUTE.
DELETE VARIABLES select.
但是,当使用非常大的数据集时,您可能会遇到一些麻烦。在这种情况下,笛卡尔积将是巨大的。
为了稍微减轻这种影响,您可以在生成笛卡尔积之前从数据集中删除所有与其他数据集没有对应匹配的案例。
这是如何做到的:
*** create the example data.
*** (I added an additional case to the second data set, which will be deleted
in the result, since it has no match in the first data set)
DATA LIST FREE / classnbr1 (F1) fact1 (A1).
BEGIN DATA
1 A
1 D
2 A
3 B
END DATA.
DATASET NAME data1.
DATA LIST FREE / classnbr2 (F1) fact2 (A2).
BEGIN DATA
1 XX
1 XY
3 ZZ
4 XY
END DATA.
DATASET NAME data2.
*** select cases who (don't) have a matching correspondent in the other dataset
** Create a list of unique key values of data set data2
** (In this Example the key Value is classnbr2).
DATASET ACTIVATE data2.
DATASET COPY data2_keylist.
DATASET ACTIVATE data2_keylist.
* Assuming the data set is already sorted by the key value.
* Mark the first occurance of every key kalue in the data set.
COMPUTE list = 1.
IF classnbr2 = LAG(classnbr2) list = 0.
SELECT IF list=1.
EXECUTE.
* Delete all variables except the (now unique) key value
MATCH FILES
/FILE *
/KEEP classnbr2.
EXECUTE.
** Match the list of data2 key values to data1 in order to mark
** which cases of data1 have at least one correspondent case in data 2.
DATASET ACTIVATE data1.
MATCH FILES
/FILE *
/TABLE data2_keylist
/RENAME classnbr2=classnbr1
/IN data2
/BY classnbr1.
EXECUTE.
** Remove cases from data1 who don't have a correspondent in data2
** and store them in another dataset, because we need to add them later.
DATASET COPY date1_nomatch.
SELECT IF data2=1.
EXECUTE.
DATASET ACTIVATE date1_nomatch.
SELECT IF data2=0.
EXECUTE.
** Now doing the same for the other data set.
** Create a list of unique key values of data set data1
** (In this Example the key Value is classnbr1).
DATASET ACTIVATE data1.
DATASET COPY data1_keylist.
DATASET ACTIVATE data1_keylist.
* Assuming the data set is already sorted by the key value.
* Mark the first occurance of every key kalue in the data set.
COMPUTE list = 1.
IF classnbr1 = LAG(classnbr1) list = 0.
SELECT IF list=1.
EXECUTE.
* Delete all variables except the (now unique) key value
MATCH FILES
/FILE *
/KEEP classnbr1.
EXECUTE.
** Match the list of data2 key values to data1 in order to mark
** which cases of data1 have at least one correspondent case in data 2.
DATASET ACTIVATE data2.
MATCH FILES
/FILE *
/TABLE data1_keylist
/RENAME classnbr1=classnbr2
/IN data1
/BY classnbr2.
EXECUTE.
** Remove cases from data1 who don't have a correspondent in data2.
SELECT IF data1=1.
EXECUTE.
*** create a cartesian product of the two reduced datasets.
DATASET ACTIVATE data1.
STATS CARTPROD INPUT2=data2
VAR1=classnbr1 fact1 VAR2=classnbr2 fact2
/SAVE OUTFILE="C:\MY FOLDER\cardprod.sav" DSNAME = outer_join.
EXECUTE.
*** create an equi join.
SELECT IF classnbr1 = classnbr2.
EXECUTE.
DELETE VARIABLES classnbr2.
*** create left outer join by adding the cases from date1_nomatch.
DATASET ACTIVATE outer_join.
ADD FILES
/FILE = *
/FILE = date1_nomatch
/BY classnbr1 fact1
/DROP data2.
EXECUTE.
* Some cleaning up.
DATASET CLOSE data1_keylist.
DATASET CLOSE date1_nomatch.
DATASET CLOSE data2_keylist.