1

Is it possible to extract/subset a dataframe by indicating only a chunk of the wanted entries-string?

The filter criteria is stored in an factor vector. But there are only the first three digits indicated. This should determine to subset all entries of the dataframe starting with them.

Example:

 # Input dataframe
 data <- read.table(header=T, text='
             ID sex size
        0120010   M    7
        0120020   F    6
        0121031   F    9
        0130010   M   11
        0130020   M   11
        0130030   F   14
        0130040   M   11
        0150030   F   11
        0150110   F   12
        0180030   F    9
        1150110   F   12
        9180030   F    9
        'colClasses =c("character", "factor", "integer"))

 # Input vector/factor with the ID chunk, containing only the fist three digits
 # of the targeted entries in data$ID
 IDfilter <- c("012", "015", "115")

 # My try/idea which sadly is not working - PLEASE HELP HERE
 subset <- data[ID %in% paste(IDfilter, "?.", sep=""),]

 # Expected subset
 > subset
           ID sex size
 1    0120010   M    7
 2    0120020   F    6
 3    0121031   F    9
 4    0150030   F   11
 5    0150110   F   12
 6    1150110   F   12

Thank you! :)

4

2 回答 2

2

像这样的东西?

data <- read.table(header=T, text='
             ID sex size
         0120010   M    7
        0120020   F    6
        0121031   F    9
        0130010   M   11
        0130020   M   11
        0130030   F   14
        0130040   M   11
        0150030   F   11
        0150110   F   12
        0180030   F    9
        1150110   F   12
        9180030   F    9
        ', colClasses =c("character", "factor", "integer"))

 IDfilter <- c("012", "015", "115") # filter must be character vector



   data[substr(data[,"ID"], 1,3) %in% IDfilter, ]
#        ID sex size
#1  0120010   M    7
#2  0120020   F    6
#3  0121031   F    9
#8  0150030   F   11
#9  0150110   F   12
#11 1150110   F   12

注意colClases. 在这种情况下,ID假设是字符以允许第一个数字为 0,0120010否则(如果它是数字或整数)这个数字将是120010

另一种选择是

data[substr(data[,"ID"], 1,nchar(IDfilter)[1]) %in% IDfilter, ]

其中 的第三个参数substr自动更新为 中第一个元素的字符数,IDfileter这里假设每个数字IDfilter具有相同的字符数。

于 2013-08-29T23:32:02.557 回答
2

正则表达式方法:

subset(data, grepl(paste0("^",IDfilter,collapse="|"), ID))

        ID sex size
1  0120010   M    7
2  0120020   F    6
3  0121031   F    9
8  0150030   F   11
9  0150110   F   12
11 1150110   F   12

注意:“^”是匹配字符串的开头。我假设您的过滤器中只有数字。

于 2013-08-30T00:26:39.663 回答