string - 在字符向量的 strsplit 上替代 sapply

Question

想象一下以下数据集（列向量）：

df <- data.frame(a=c("AB3474","AB3482","AB3458","AB3487","AB3471","AB3452"))
df
       a
1 AB3474
2 AB3482
3 AB3458
4 AB3487
5 AB3471
6 AB3452

现在我想建立一个新的向量来获取值，“a”在第五个位置。所以生成的 df 应该如下所示：

df_new
       a new
1 AB3474   7
2 AB3482   8
3 AB3458   5
4 AB3487   8
5 AB3471   7
6 AB3452   5

我在拆分的字符串上“应用”（使用sapplyand strsplit），但我想有更简单且希望更快的方法来解决这个问题。

有什么建议么？

score 6 · Accepted Answer

用这个：

df_new <- within(df, new <- substr(a, 5, 5))

结果：

       a new
1 AB3474   7
2 AB3482   8
3 AB3458   5
4 AB3487   8
5 AB3471   7
6 AB3452   5

编辑：回答下面的评论：

within(df, new <- paste0(substr(a, 5, 5), ifelse(as.numeric(substr(a, 6, 6))>5, "b", "a")))

结果：

       a new
1 AB3474  7a
2 AB3482  8a
3 AB3458  5b
4 AB3487  8b
5 AB3471  7a
6 AB3452  5a

请注意，这as.numeric是为了避免词法比较。

score 4 · Accepted Answer

如果你有很多行，也许你会从一些Rcpp编译好的代码中受益。您可以将其粘贴在具有扩展名的文件中.cpp：

#include <Rcpp.h>
#include <string>

using namespace Rcpp;

//[[Rcpp::export]]

std::vector< std::string > extrC(CharacterVector x, int y) {
    int n = x.size();
    std::vector< std::string > out(n);

    for(int i = 0; i < n; ++i) {
      std::string tmp = Rcpp::as<std::string>(x[i]);
      out[i] = tmp[y-1]; //note the difference between indexing in R and C++ - C++ starts at 0!
    }
    return out;
  }

然后在R中：

require(Rcpp)
sourceCpp("C:/path/to/file.cpp")
df$new <- extrC( df$a , 5 )
df
       a new
1 AB3474   7
2 AB3482   8
3 AB3458   5
4 AB3487   8
5 AB3471   7
6 AB3452   5

这个想法

将字符向量作为输入，将整数作为每个字符串中要提取值的位置。返回 astd::vector< std::string >并让Rcpp处理标准库对象到 R 对象的包装。std::string对象具有at我们可以用来提取所需字符的方法。

注意，这可能不是很安全，因为我们不做任何检查来查看所需的索引是否真的落在每个字符串中。

string - 在字符向量的 strsplit 上替代 sapply

2 回答 2

这个想法

Related

Reference