2

我有以下数据:

ID        AGE SEX   RACE    COUNTRY VISITNUM    VSDTC   VSTESTCD    VSORRES
32320058    58  M   WHITE   UKRAINE 2   2016-04-28       DIABP          74
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       HEIGHT        183
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       SYSBP         116
32320058    58  M   WHITE   UKRAINE 2   2016-04-28       SYSBP         116
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       WEIGHT        109
22080090    75  M   WHITE   MEXICO  1   2016-05-17       DIABP          81
22080090    75  M   WHITE   MEXICO  1   2016-05-17       HEIGHT        176
22080090    75  M   WHITE   MEXICO  1   2016-05-17       SYSBP         151

我想使用 tidyr::spread 重塑数据以获得以下输出:

ID AGE SEX  RACE    COUNTRY VISITNUM    VSDTC    DIABP SYSBP WEIGHT HEIGHT
32320058    58  M   WHITE   UKRAINE 2   2016-04-28   74   116   NA   NA
32320058    58  M   WHITE   UKRAINE 1   2016-04-21   NA   116   109   183
22080090    75  M   WHITE   MEXICO  1   2016-05-17   81   151   NA   176

我收到重复的错误,尽管我的数据中没有重复!

df1=spread(df,VSTESTCD,VSORRES)

错误:行(36282、36283)、(59176、59177)、(59179、59180)的重复标识符

4

1 回答 1

0

我假设我理解你的问题

# As many rows are identical, we should create a unique identifier column

# Let's take iris dataset as an example

# install caret package if you don't have it

install.packages("caret")

# require library
library(tidyverse)
library(caret)

# check the dataset (iris)
head(iris)

# assume that I gather all columns in iris dataset, except Species variable

# Create an unique identifier column and transform wide data to long data as follow

iris_gather<- iris %>% dplyr::mutate(ID=row_number(Species)) %>% tidyr::gather(key=Type,value=my_value,1:4)

# check first six rows

head(iris_gather)
# using *spread* to spread out the data

iris_spread<- iris_gather %>% dplyr::group_by(ID) %>% tidyr::spread(key=Type,value=my_value) %>% dplyr::ungroup() %>% dplyr::select(-ID)

# Check first six rows of iris_spread

head(iris_spread)
于 2017-06-04T22:47:58.227 回答