目录是具有不同值的字符向量。它具有以下结构
value name location companybrand
1111 ikea boston nike
1234 7/11 new york marlboro
1456 walmart new york marlboro
列表包含美国所有城市 -> 芝加哥、波士顿、纽约、洛杉矶和另一列有品牌全名
Location Brand
New York, 5th Avenue Coca Cola LTD
New York, 51 Str Nike Corporation
New York, Broadway Marlboro Incorporated
if (sum(grepl(paste("\\b", as.character(location), "\\b", sep = ""), catalog$value[i], fixed = FALSE)) > 0 &&
sum(grepl(paste("\\b", as.character(companybrand), "\\b", sep = ""), catalog$value[i], fixed = FALSE)) > 0){
subdata <- subset(listing, listing$local == as.character(location[which(grepl(paste("\\b", as.character(location), "\\b", sep = ""), catalog$value[i], fixed = FALSE)]) && listing$commercial == as.character(companybrand[which(grepl(paste("\\b", as.character(companybrand), "\\b", sep = ""), catalog$value[i], fixed = FALSE))]))
}
如您所见,我正在尝试使用多种模式运行 grepl 函数,该模式返回以下错误:
Warning message:
In grepl(paste("\\b", distmunicipality, "\\b", sep = ""), ctlg$distvalor[i], :
argument 'pattern' has length > 1 and only the first element will be used
我在其他帖子中读到,对此的适当解决方案是将所有要测试的模式折叠成带有管道分隔符的单个字符串,如下所示:
companybrand <- paste(companybrand, collapse = "|")
location <- paste(location, collapse = "|")
这适用于小向量,但在我的情况下,companybrand 中有 400 万个元素,这导致我的 R 因内存不足而终止。有没有一种实用的方法(可能使用 sapply)来运行这场比赛,而不需要计算上的繁重?