20

在 R 中,几乎is.*我能想到的每个函数都有一个对应的as.*. 有is.na但没有as.na。为什么不,如果这样的功能有意义,你将如何实现?

我有一个x可以是logicalcharacter、 或的向量integer,我想将其转换为具有相同类和长度的向量,但填充了适当的:、、、或。numericcomplexNANA_character_NA_integer_NA_real_NA_complex_

我目前的版本:

as.na <- function(x) {x[] <- NA; x}
4

4 回答 4

15

为什么不is.na<-按照指示使用?is.na

> l <- list(integer(10), numeric(10), character(10), logical(10), complex(10))
> str(lapply(l, function(x) {is.na(x) <- seq_along(x); x}))
List of 5
 $ : int [1:10] NA NA NA NA NA NA NA NA NA NA
 $ : num [1:10] NA NA NA NA NA NA NA NA NA NA
 $ : chr [1:10] NA NA NA NA ...
 $ : logi [1:10] NA NA NA NA NA NA ...
 $ : cplx [1:10] NA NA NA ...
于 2012-12-18T16:57:34.320 回答
13

这似乎始终比您的功能快:

as.na <- function(x) {
    rep(c(x[0], NA), length(x))
}

(感谢 Joshua Ulrich 指出我的早期版本没有保留类属性。)


为了记录,这里是一些相对时间:

library(rbenchmark)

## The functions
flodel <- function(x) {x[] <- NA; x}
joshU <- function(x) {is.na(x) <- seq_along(x); x}
joshO <- function(x) rep(c(x[0], NA), length(x))

## Some vectors to  test them on
int  <- 1:1e6
char <- rep(letters[1:10], 1e5)
bool <- rep(c(TRUE, FALSE), 5e5)

benchmark(replications=100, order="relative",
    flodel_bool = flodel(bool),
    flodel_int  = flodel(int),
    flodel_char = flodel(char),
    joshU_bool = joshU(bool),
    joshU_int  = joshU(int),
    joshU_char = joshU(char),
    joshO_bool = joshO(bool),
    joshO_int  = joshO(int),
    joshO_char = joshO(char))[1:6]        
#          test replications elapsed relative user.self sys.self
# 7  joshO_bool          100    0.46    1.000      0.33     0.14
# 8   joshO_int          100    0.49    1.065      0.31     0.18
# 9  joshO_char          100    1.13    2.457      0.97     0.16
# 1 flodel_bool          100    2.31    5.022      2.01     0.30
# 2  flodel_int          100    2.31    5.022      2.00     0.31
# 3 flodel_char          100    2.64    5.739      2.36     0.28
# 4  joshU_bool          100    3.78    8.217      3.13     0.66
# 5   joshU_int          100    3.95    8.587      3.30     0.64
# 6  joshU_char          100    4.22    9.174      3.70     0.51
于 2012-12-18T16:55:56.293 回答
11

该函数不存在,因为它不是类型转换。类型转换会将 1L 更改为 1.0,或将“1”更改为 1L。NA 类型不是从其他类型转换而来的,除非该类型是文本。鉴于只有一种类型可以转换,并且有很多选项可以分配 NA (如在许多其他答案中一样),因此不需要这样的功能。

您获得的每个答案都会将 NA 分配给传递给它的所有内容,但您可能只想有条件地执行此操作。有条件地进行分配或调用小型包装器也没有什么不同。

于 2012-12-18T17:38:45.040 回答
0

老问题,但是,怎么样

as.na <- function(obj){
  if(is.factor(obj)){
    # Special case for factors - any others that need to be handled?
    factor(rep(NA, length(obj)), levels = levels(obj))
  } else{
    objClass <- class(obj)
    x <- rep(NA, length(obj))
    class(x) <- objClass 
    x
  }
}

对于数据框:

DF <- data.frame(
  int = seq(1, 10),
  real = seq(1, 10) + 0.1,
  char = letters[1:10],
  logi = rep(c(TRUE, FALSE), 5),
  Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
  posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
  stringsAsFactors = FALSE
) 
DF$factr <- as.factor(LETTERS[1:10])
str(DF)
'data.frame':   10 obs. of  7 variables:
 $ int  : int  1 2 3 4 5 6 7 8 9 10
 $ real : num  1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
 $ char : chr  "a" "b" "c" "d" ...
 $ logi : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
 $ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
 $ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10

DF_na <- DF
for(i in colnames(DF_na)){
  DF_na[,i] <- as.na(DF_na[,i])
}

str(DF_na)
'data.frame':   10 obs. of  7 variables:
 $ int  : int  NA NA NA NA NA NA NA NA NA NA
 $ real : num  NA NA NA NA NA NA NA NA NA NA
 $ char : chr  NA NA NA NA ...
 $ logi : logi  NA NA NA NA NA NA ...
 $ Date : Date, format: NA NA NA ...
 $ posix: POSIXct, format: NA NA NA ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA

对于数据表:

library(data.table)

DT <- data.table::data.table(
  int = seq(1, 10),
  real = seq(1, 10) + 0.1,
  char = letters[1:10],
  logi = rep(c(TRUE, FALSE), 5),
  Date = seq.Date(as.Date("2019-09-03"), by = 1, length.out = 10),
  posix = seq.POSIXt(as.POSIXct("2019-09-03 12:00:00"), by = 360, length.out = 10),
  factr = as.factor(LETTERS[1:10])
)
str(DT)
Classes ‘data.table’ and 'data.frame':  10 obs. of  7 variables:
 $ int  : int  1 2 3 4 5 6 7 8 9 10
 $ real : num  1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
 $ char : chr  "a" "b" "c" "d" ...
 $ logi : logi  TRUE FALSE TRUE FALSE TRUE FALSE ...
 $ Date : Date, format: "2019-09-03" "2019-09-04" "2019-09-05" ...
 $ posix: POSIXct, format: "2019-09-03 12:00:00" "2019-09-03 12:06:00" "2019-09-03 12:12:00" ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
 - attr(*, ".internal.selfref")=<externalptr> 

DT_na <-
  copy(DT)[, lapply(.SD, as.na)]
str(DT_na)
Classes ‘data.table’ and 'data.frame':  10 obs. of  7 variables:
 $ int  : int  NA NA NA NA NA NA NA NA NA NA
 $ real : num  NA NA NA NA NA NA NA NA NA NA
 $ char : chr  NA NA NA NA ...
 $ logi : logi  NA NA NA NA NA NA ...
 $ Date : Date, format: NA NA NA ...
 $ posix: POSIXct, format: NA NA NA ...
 $ factr: Factor w/ 10 levels "A","B","C","D",..: NA NA NA NA NA NA NA NA NA NA
 - attr(*, ".internal.selfref")=<externalptr> 
于 2019-09-03T22:19:50.810 回答