3

I am looking to add a column in my dataframe using the transform function. One of my column contains character strings as elements. I wish to find certain strings and add another column.

UNIT.NO. USAGE..kWh.month.
     A1               863
     A1              1339
     D3              1058
     D1               782
     L1              1339
     L7              1058
     L1               782

I wish to add another column to classify category of data and get the following result:

UNIT.NO. USAGE..kWh.month.   Category
     A1               863       A
     A1              1339       A
     D3              1058       D
     D1               782       D
     L1              1339       L
     L7              1058       L
     L1               782       L

I used the following code but it doesn't work.

dataset.1<-transform(
  dataset.1,
  Category=
    if(grepl("A",dataset.1$UNIT.NO.)==T){
      "A"
    } else 
      if(grepl("D",dataset.1$UNIT.NO.)==T){
        "D"
      } else 
        if(grepl("L",dataset.1$UNIT.NO.)==T){
          "L"
        }else{
              "Other"
            }
)

Warning in R : In if (grepl("A", dataset.1$UNIT.NO.) == T) { : the condition has length > 1 and only the first element will be used

Hence, all my Category values are now A and different characters are not being replaced as per their Unit No. What is the best way to add such a column.

I need these categories to perform a non parametric analysis. Thanks in advance.

4

3 回答 3

3

一种选择只是

indx <- gsub("[0-9]", "" , df1$UNIT.NO.)
df1$Category <- "Other"
df1[indx %in% c("A","D","L"), "Category"] <- indx

另一个(更有效)

library(data.table)
setDT(df1)[, Category := "Other"]
df1[indx %in% c("A","D","L"), Category := indx]
于 2014-08-21T14:22:54.980 回答
2

用于substr获取第一个字母:

dataset.1$Category <- ifelse(substr(dataset.1$"UNIT.NO.",1,1) %in% c("A","D","L"), 
                             substr(dataset.1$"UNIT.NO.",1,1),
                             "other")

如果您不需要“其他”,只需使用:

dataset.1$Category <- substr(dataset.1$"UNIT.NO.",1,1)
于 2014-08-21T14:23:15.133 回答
1

有很多方法:

#dummy data
dataset.1 <- read.table(text="
UNIT.NO. USAGE..kWh.month.
A1               863
A1              1339
D3              1058
D1               782
L1              1339
L7              1058
L1               782
XX1               782", header=TRUE)

#using your approach - nested ifelse
dataset.1$CategoryIfElse <-
  ifelse(grepl("A",dataset.1$UNIT.NO.)==T,"A",
         ifelse(grepl("D",dataset.1$UNIT.NO.)==T,"D",
                ifelse(grepl("L",dataset.1$UNIT.NO.)==T,"L","Other")))

#using substr
dataset.1$CategorySusbstr <-
  substr(dataset.1$"UNIT.NO.",1,1)
dataset.1$CategorySusbstr <- 
  factor(dataset.1$CategorySusbstr,levels=c("A","D","L","Other"))
dataset.1$CategorySusbstr[ is.na(dataset.1$CategorySusbstr)] <- "Other"

#result
dataset.1

# UNIT.NO. USAGE..kWh.month. CategoryIfElse CategorySusbstr
# 1       A1               863              A               A
# 2       A1              1339              A               A
# 3       D3              1058              D               D
# 4       D1               782              D               D
# 5       L1              1339              L               L
# 6       L7              1058              L               L
# 7       L1               782              L               L
# 8      XX1               782          Other           Other
于 2014-08-21T14:42:31.267 回答