r - 删除以字符（Vxxx）开头的记录以在 R 中启用 floor() 命令

Question

我正在处理急诊室 ICD-9 代码数据（健康诊断），它是三位数字代码，后面最多有 2 个小数（例如：499、499.1、499.51 等）。一些特殊代码有字母“V”而不是第一个数字，例如“V10.46”。

每个急诊室就诊（行）最多可以有 11 个诊断代码（列），因此我使用 reshape() 将数据集更改为长格式。现在我想使用 floor() 删除那些小数点。但是 R 不能用一个字符来放置东西！我收到此错误：Math.factor(dtl$diag) 中的错误：下限对因子没有意义

这篇文章有一些相关性，但我想知道是否有更好的方法？ R：删除变量中的字符观察

有任何想法吗？

score 5 · Accepted Answer

您可以使用正则表达式删除点和之后的所有内容。

x <- c("499", "499.1", "499.51", "V10.46")
gsub("\\..*", "", x)
# Output:
# [1] "499" "499" "499" "V10"

score 3 · Accepted Answer

以@Vincent Zoonekynd 的出色答案为基础，如果目的是floor在数据上使用，您可以去掉“V”并调用floor其余部分：

x <- c("499", "499.1", "499.51", "V10.46")
# replace all occurences of "V" with nothing ("") in x:
x.stripped <- gsub("V", "", x) 
# convert to numeric so we can use floor():
x.floor <- floor(as.numeric(x.stripped))

根据您的错误消息“对因子没有意义”，您的数据列已作为字符串读入（因为某些行中的“V”），并且 R 的默认行为是将字符串列转换为因素（如类别）。

如果您收到有关gsub未处理因子的错误，则需要先将列转换为字符串：

mydf$columname <- as.character(mydf$columnname)

然后你可以像以前一样继续。

score 1 · Accepted Answer

对于前三个字母，您可以使用 substring 函数。

icd9 <- factor(c("499", "499.1", "499.51", "V10.46"))
substr(as.character(icd9),1,3)# as.character is used 
                              # because icd9 is factor in your data

输出

[1] "499" "499" "499" "V10"

r - 删除以字符（Vxxx）开头的记录以在 R 中启用 floor() 命令

3 回答 3

Related

Reference