0

我有一个包含癌症患者和不同结果的数据集

TypeofOutcome        DateStageIV

NA                   01.04.2014
Died from melanoma   01.06.2011
Died from melanoma   01.11.2013

我想要一个名为“结果”的新列,其中所有还活着的患者都编码为 1,所有死者都编码为 0。从之前的练习中,我创建了一个代码:

mergedData$Outcome <- 1* (mergedData$TypeofOutcome = c ("Alive with stable disease", "Alive with progressive disease", "Alive with complete response"))

我已经假设这不起作用,我收到了错误消息:

1 * (mergedData$TypeofOutcome = c("Alive with stable disease", :
二元运算符的非数字参数) 中的错误

我确信我的问题有一个简单的解决方案。

4

1 回答 1

0

如果我理解正确,您想根据字符串变量的值创建一个二分变量,例如:如果TypeOfOutcome匹配“患有稳定疾病”、“患有进行性疾病”或“患有完全反应”中的任何一个,Outcome将为 1 否则为 0。我假设您的数据集与此类似

mergedData <- data.frame(
  TypeOfOutcome = c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response", NA, "Died from melanoma"), 
  DateStageIV = sample(seq(as.Date('2011/01/01'), as.Date('2015/01/01'), by="day"), 5))


#                    TypeOfOutcome DateStageIV
# 1      Alive with stable disease  2013-05-09
# 2 Alive with progressive disease  2014-08-08
# 3   Alive with complete response  2013-02-10
# 4                           <NA>  2014-05-23
# 5             Died from melanoma  2012-08-08

该函数ifelse适用于这种从重新编码,基本语法是:

ifelse(test, yes, no)

如果中的语句test为真,则返回 的值,yes否则返回 的值no。在这种情况下test是患者仍然活着的所有情况,由字符串表示,TypeofOutcome“疾病稳定”、“疾病进展”或“完全缓解”。对此的测试将是:

test <- mergedData$TypeOfOutcome %in% c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response")

test如果in 中TRUE的值与运算符TypeOfOutcome之后的任何情况匹配%in%yes然后将为 1 和no0。创建新变量

mergedData$Outcome <- ifelse(test, 1, 0)

mergedData

#                    TypeOfOutcome DateStageIV Outcome
# 1      Alive with stable disease  2013-05-09       1
# 2 Alive with progressive disease  2014-08-08       1
# 3   Alive with complete response  2013-02-10       1
# 4                           <NA>  2014-05-23       0
# 5             Died from melanoma  2012-08-08       0
于 2016-02-14T19:53:48.437 回答