我在过去 10 周里弄乱了我的财务数据集。我试图总结每个商店描述所花费/存入的金额。我能够完成这项工作。
totalofeachstore <- FullStatement %>% group_by( Description) %>%
summarise_at(vars(Amount), funs(sum(., na.rm = TRUE)))
或者
totalofeachstore <- totalofeachstore %>%
group_by(Description) %>%
summarize(Amount = sum(Amount))
我发现的问题是许多商店在我的报表中包含他们的商店# 或描述。一个例子就是。
Arco Gas #345 -$45.54
Arco Gas #678 -$52.72
由于商店#'s,总和不会像我预期的那样崩溃。有没有办法折叠/汇总具有相似名称但不相同的行?例如,在以下商店名称中.. 我可以根据关键字 AMAZON 或更好地折叠所有亚马逊商店吗,因为奇怪的 AMZN 和 AMZ 在列表中排在第 4 和第 5 位.. 我可以将它们组合起来吗?字母?
AMAZON.COM*MT2M03AW1 AM PURCHASE AMZN.COM/BILL WA -8.08
AMAZON.COM*MT80Z2EC0 AM PURCHASE AMZN.COM/BILL WA -13.28
AMAZON.COM*MT8G19G51 AM PURCHASE AMZN.COM/BILL WA -31.03
AMZ*Stride Rite PURCHASE Customerservi NY -35.20
AMZN MKTP US AMZN.COM/B PURCHASE AMZN.COM/BILL WA -181.08
ARBYS 0154 PURCHASE -13.90
ARCO #42472 AM PURCHASE -30.73
ARCO #42493 AM PURCHASE -29.35
AUNT CHILADA'S PURCHASE -15.98
我发现了关于折叠相似行的类似问题,但他们并没有试图同时求和。这些问题如下。
EDIT1 经过一些额外的 GOOGLE 搜索.. 我发现了一些“正则表达式”建议,它们可能能够做我正在寻找的东西.. 但是,我对这些工作原理一无所知,做 ?grep 对我没有多大帮助..这看起来比我目前理解的要复杂得多。任何人都可以帮我解决这个问题吗?
来自 R 中的 ?grep。
grep, grepl, regexpr, gregexpr and regexec search for matches to argument
pattern within each element of a character vector: they differ in the
format of and amount of detail in the results.
sub and gsub perform replacement of the first and all matches respectively.
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
fixed = FALSE, useBytes = FALSE, invert = FALSE)
grep("[a-z]", letters)
txt <- c("arm","foot","lefroo", "bafoobar")
if(length(i <- grep("foo", txt)))
cat("'foo' appears at least once in\n\t", txt, "\n")
i # 2 and 4
txt[i]
EDIT2:根据以下建议,尝试了此代码:
Totals2 <- totalofeachstore %>%
+ #remove everything after a *
+ mutate(store_name = gsub("\\*.*","",Description),
+ #remove everything after a space and a #
+ store_name = gsub("\\ #.*","",store_name),
+ #remove everything after a space and a number sequence
+ store_name = gsub("\\ [0-9].*","",store_name),
+ #assign the other Amazon purchases to Amazon
+ store_name =
ifelse(str_detect(store_name,'AMZ')==TRUE,'AMAZON.COM',store_name))
但是以下错误不断弹出..我不认为gsub是base以外的包的一部分..但这感觉就像我没有加载包含“str_detect”之类的包..
Error in mutate_impl(.data, dots) :
Evaluation error: could not find function "str_detect".
编辑3:完美!
使用“tidyverse”包修复了我收到的错误,一切都完全按照描述的那样工作,这正是我正在寻找的。