What is the most effective (ie efficient / appropriate) way to clean up a factor containing multiple levels that need to be collapsed? That is, how to combine two or more factor levels into one.
Here's an example where the two levels "Yes" and "Y" should be collapsed to "Yes", and "No" and "N" collapsed to "No":
## Given:
x <- c("Y", "Y", "Yes", "N", "No", "H") # The 'H' should be treated as NA
## expectedOutput
[1] Yes Yes Yes No No <NA>
Levels: Yes No # <~~ NOTICE ONLY **TWO** LEVELS
One option is of course to clean the strings before hand using sub
and friends.
Another method, is to allow duplicate label, then drop them
## Duplicate levels ==> "Warning: deprecated"
x.f <- factor(x, levels=c("Y", "Yes", "No", "N"), labels=c("Yes", "Yes", "No", "No"))
## the above line can be wrapped in either of the next two lines
factor(x.f)
droplevels(x.f)
However, is there a more effective way?
While I know that the levels
and labels
arguments should be vectors, I experimented with lists and named lists and named vectors to see what happens
Needless to say, none of the following got me any closer to my goal.
factor(x, levels=list(c("Yes", "Y"), c("No", "N")), labels=c("Yes", "No"))
factor(x, levels=c("Yes", "No"), labels=list(c("Yes", "Y"), c("No", "N")))
factor(x, levels=c("Y", "Yes", "No", "N"), labels=c(Y="Yes", Yes="Yes", No="No", N="No"))
factor(x, levels=c("Y", "Yes", "No", "N"), labels=c(Yes="Y", Yes="Yes", No="No", No="N"))
factor(x, levels=c("Yes", "No"), labels=c(Y="Yes", Yes="Yes", No="No", N="No"))