1

我正在努力做一些看似简单的事情。

所以我有一个代码列表及其重新编码。

> head(codesTv)

  X5000 TV.Diary.Event
1  5001           Play
2  5002   Drama Series
3  5003    Other Drama
4  5004           Film
5  5005      Pop Music
6  5006         Comedy

然后我有一个需要重新编码的向量,名为ttest.

> head(as.data.frame(ttest))
                ttest
1        SPITTING IMA
2                5999
3        KRAMERVSKRAM
4                NEWS
5           BROOKSIDE
6             NOTHING

我需要的是简单地从codesTv需要重新编码的值中重新编码。

但我发现这样做的唯一方法是这个繁琐的代码:

ttest [ ttest %in% codesTv$X5000 ] = codesTv$TV.Diary.Event [ match(ttest [ttest %in% codesTv$X5000], codesTv$X5000) ] 

有人会有更简单的想法吗?

数据

ttest = c("SPITTING IMA", "5999", "KRAMERVSKRAM", "NEWS", "BROOKSIDE", 
"NOTHING", "NOTHING", "BROOKSIDE", "5004", "5004", "5999", "YANKS", 
"5999", "5999", "5999", "5999", "\"V\"", "GET FRESH", "5999", 
"5999", "HEIDI", "FAME", "SAT  SHOW", "5021", "BLUE PETER", "V", 
"EASTENDERS", "WORLD  CUP", "GRANDSTAND", "SPORT", "WORLD CUP", 
"BLUE PETER", "WORLD CUP", "HORIZON", "REGGIEPERRIN", "5999", 
"BROOKSIDE", "HNKYTNK MAN", "5999", "5999")

 codesTv = structure(list(X5000 = c("5001", "5002", "5003", "5004", "5005", 
"5006", "5007", "5008", "5009", "5010", "5011", "5012", "5013", 
"5014", "5015", "5016", "5017", "5019", "5020", "5021", "5022", 
"5023", "5888", "5999"), TV.Diary.Event = c("Play", "Drama Series", 
"Other Drama", "Film", "Pop Music", "Comedy", "Chat Show", "Quiz/Panel Game", 
"Cartoon", "Special L/E Event", "Classical Music", "Contemporary Music", 
"Arts", "News", "Politics", "Consumer Affairs", "Spec Current Affairs", 
"Documentary", "Religious Affairs", "Sport", "Childrens TV", 
"Party Political", "Continuation Event", "Non-event (Missing)"
)), .Names = c("X5000", "TV.Diary.Event"), row.names = c(NA, 
-24L), class = "data.frame")
4

1 回答 1

2

OP的解决方案应该可以正常工作。这是另一种方式:

library(data.table)

# confirm that there is overlap
intersect(ttest, codesTv$X5000) # "5999" "5004" "5021"  

# replace values in ttest
setDT(list(X5000=ttest))[codesTv, X5000 := i.TV.Diary.Event, on="X5000"]

# confirm that the values were overwritten
intersect(ttest, codesTv$X5000) # character(0)

从@eddi 窃取了这个想法。这应该是节省内存的,因为我们是ttest通过引用修改而不是复制。

于 2015-10-07T16:54:06.603 回答