9

我正在尝试将字符串转换为数字,但遇到了一些意外行为str_replace。这是一个最小的工作示例:

library(stringr)
x <- c("0", "NULL", "0")

# This works, i.e. 0 NA 0
as.numeric(str_replace(x, "NULL", ""))

# This doesn't, i.e. NA NA NA
as.numeric(str_replace(x, "NULL", NA))

在我看来,第二个示例应该可以工作,因为它应该只用NA(这是字符向量中的有效值)替换向量中的第二个条目。但它没有:内部str_replace将所有三个条目转换为NA.

这里发生了什么?我浏览了文档str_replacestri_replace_all但没有看到明显的解释。

编辑:stringr_1.0.0澄清一下,这是stringi_1.0-1在 R 3.1.3、Windows 7 上使用的。

4

4 回答 4

5

这是stringi包中的一个错误,但现在已修复(召回stringr基于stringi- 前者也会受到影响)。

使用最新的开发版本,我们得到:

stri_replace_all_fixed(c("1", "NULL"), "NULL", NA)
## [1] "1" NA
于 2016-01-30T16:26:00.033 回答
4

查看源代码str_replace

function (string, pattern, replacement) 
{
    replacement <- fix_replacement(replacement)
    switch(type(pattern), empty = , bound = stop("Not implemented", 
        call. = FALSE), fixed = stri_replace_first_fixed(string, 
        pattern, replacement, opts_fixed = attr(pattern, "options")), 
        coll = stri_replace_first_coll(string, pattern, replacement, 
            opts_collator = attr(pattern, "options")), regex = stri_replace_first_regex(string, 
            pattern, replacement, opts_regex = attr(pattern, 
                "options")), )
}
<environment: namespace:stringr>

这导致find fix_replacement,它位于Github,我也将其放在下面。如果您在主环境中运行它,您会发现fix_replacement(NA)返回NA. 您可以看到它依赖于stri_replace_all_regex,它来自stringi包。

fix_replacement <- function(x) {
    stri_replace_all_regex(
        stri_replace_all_fixed(x, "$", "\\$"),
        "(?<!\\\\)\\\\(\\d)",
        "\\$$1")
}

有趣的是,当使用您的参数(您的、、和)运行时,两者都会stri_replace_first_fixed返回stri_replace_first_regex。问题在于,并且是 C++ 代码,所以要弄清楚发生了什么变得有点棘手。c(NA,NA,NA)stringpatternreplacementstri_replace_first_fixedstri_replace_first_regex

stri_replace_first_fixed可以在这里找到。

stri_replace_first_regex可以在这里找到。

据我所知,在有限的时间和我相对生疏的 C++ 知识中,该stri__replace_allfirstlast_fixed函数replacement使用stri_prepare_arg_string. 根据文档,如果遇到 NA,它将引发错误。除此之外,我没有时间完全追踪它,但我怀疑这个错误可能导致所有 NA 的奇怪返回。

于 2015-12-17T16:47:46.813 回答
1

这是使用 dplyr 的 across方法和 stringr 包的解决方案。

df <- data.frame(x=c("a","b","null","e"),
                 y=c("g","null","h","k"))  

df2 <- df %>% 
  mutate(across(everything(),str_replace,"null",NA_character_))
于 2021-07-15T01:06:20.457 回答
0

有另一种方法来回答这个问题,如此处所示使用NA_character_

问题的简短回答:

library(stringr)
x <- c("0", "NULL", "0")
y <- as.numeric(str_replace(x, "NULL", NA_character_))

产生:

> y
[1]  0 NA  0
> typeof(y)
[1] "double"

走得更远

library(dplyr)
library(stringr)
# create a dummy dataset
ex = starwars %>% select(name, hair_color, homeworld) %>% head(6)
print(ex)
# lets say you want to replace all "Tatooine" by NA
# this produce the expected output
ex %>% mutate(homeworld = str_replace_all(homeworld, pattern = "Tatooine", NA_character_))

# HOWEVER,
# From Hadley's comment: "str_replace() has to replace parts of a string and replacing part of a string with NA doesn't make sense."
# then be careful using this method, see the example below:
ex %>% mutate(hair_color = str_replace_all(hair_color, pattern = "brown", NA_character_))
# all air colors with "brown", including "blond, grey" (Owen Lars, line 6) are now NA

输出

> print(ex)
# A tibble: 10 x 3
   name               hair_color    homeworld
   <chr>              <chr>         <chr>    
 1 Luke Skywalker     blond         Tatooine 
 2 C-3PO              NA            Tatooine 
 3 R2-D2              NA            Naboo    
 4 Darth Vader        none          Tatooine 
 5 Leia Organa        brown         Alderaan 
 6 Owen Lars          brown, grey   Tatooine  

> ex %>% mutate(homeworld = str_replace_all(homeworld, pattern = "Tatooine", NA_character_))
# A tibble: 10 x 3
   name               hair_color    homeworld
   <chr>              <chr>         <chr>    
 1 Luke Skywalker     blond         NA       
 2 C-3PO              NA            NA       
 3 R2-D2              NA            Naboo    
 4 Darth Vader        none          NA       
 5 Leia Organa        brown         Alderaan 
 6 Owen Lars          brown, grey   NA         

 > ex %>% mutate(hair_color = str_replace_all(hair_color, pattern = "brown", NA_character_))
# A tibble: 10 x 3
   name               hair_color    homeworld
   <chr>              <chr>         <chr>    
 1 Luke Skywalker     blond         Tatooine 
 2 C-3PO              NA            Tatooine 
 3 R2-D2              NA            Naboo    
 4 Darth Vader        none          Tatooine 
 5 Leia Organa        NA            Alderaan 
 6 Owen Lars          NA            Tatooine 
于 2020-08-14T08:30:36.073 回答