假设您正在处理以下数据,这是您从在线表单下载的 CSV 文件。复制并粘贴下面的文本并将其保存到名为“CourseInterestSurvey.CSV”的文本文件中。
Timestamp,Which courses are you interested in?,What software do you use?
12/28/2012 11:57:56,"Research Methods, Data Visualization","Gnumeric, SPSS, R"
12/28/2012 11:58:09,Data Visualization,"SPSS, Stata, R"
12/28/2012 11:59:09,"Research Dissemination, Graphic Design",Adobe InDesign
12/28/2012 11:59:27,"Data Analysis, Data Visualization, Graphic Design","Excel, OpenOffice.org/Libre Office, Stata"
12/28/2012 11:59:44,Data Visualization,"R, Adobe Illustrator"
使用以下语法将其读入 SPSS:
GET DATA
/TYPE=TXT
/FILE="path\to\CourseInterestSurvey.CSV"
/DELCASE=LINE
/DELIMITERS=","
/QUALIFIER='"'
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
Timestamp A19
CourseInterest A49
Software A41.
CACHE.
EXECUTE.
DATASET NAME DataSet2 WINDOW=FRONT.
LIST.
它目前看起来像下图——三列(一列时间戳,两列包含我们想要的数据):
使用此处的一些语法,我们可以将单元格拆分如下:
* We know the string does not excede 50 characters.
* We got that information while we were reading our data in.
STRING #temp(a50).
* We're going to work on the "CourseInterest" variable.
COMPUTE #temp=CourseInterest.
* We're going to create 3 new variables with the prefix "CourseInterest".
* You should modify this according to the actual number of options your data has
* and the maximum length of one of the strings in your data.
VECTOR CourseInterest(3, a25).
* Here's where the actual variable creation takes place.
LOOP #i = 1 TO 3.
. COMPUTE #index=index(#temp,",").
. DO IF #index GT 0.
. COMPUTE CourseInterest(#i)=LTRIM(substr(#temp,1, #index-1)).
. COMPUTE #temp=substr(#temp, #index+1).
. ELSE.
. COMPUTE CourseInterest(#i)=LTRIM(#temp).
. COMPUTE #temp=''.
. END IF.
END LOOP IF #index EQ 0.
LIST.
结果:
这一次只处理一列,我还不够熟悉,无法修改它以处理多列。但是,如果你要切换到 R,我已经有了一些现成的 函数来帮助处理这些情况。