我的数据框Expenses
如下所示:
date name expenditure type
23MAR2013 KOSH ENTRP 4000 COMPANY
23MAR2013 JOHN DOE 800 INDIVIDUAL
24MAR2013 S KHAN 300 INDIVIDUAL
24MAR2013 JASINT PVT LTD 8000 COMPANY
25MAR2013 KOSH ENTRPRISE 2000 COMPANY
25MAR2013 JOHN S DOE 220 INDIVIDUAL
25MAR2013 S KHAN 300 INDIVIDUAL
26MAR2013 S KHAN 300 INDIVIDUAL
早些时候,我从 name
列中识别出重复名称和模式的存在,并将其存储在向量NameVector
中,如下所示。
KOSH JOHN DOE KHAN JASINT
我的问题是,如何将每个字符串模式Expenses$name
与向量匹配并NameVector
在主数据框中以分类方式打印它?
date name expenditure type category
23MAR2013 KOSH ENTRP 4000 COMPANY KOSH
23MAR2013 JOHN DOE 800 INDIVIDUAL JOHN DOE
24MAR2013 S KHAN 300 INDIVIDUAL KHAN
24MAR2013 JASINT PVT LTD 8000 COMPANY JASINT
25MAR2013 KOSH ENTRPRISE 2000 COMPANY KOSH
25MAR2013 JOHN S DOE 220 INDIVIDUAL JOHN DOE
25MAR2013 SALM KHAN 300 INDIVIDUAL KHAN
26MAR2013 S KHAN 300 INDIVIDUAL KHAN
我尝试name
使用每个可能的分隔符(空格、|、*、逗号等)拆分列strsplit()
,以将名称的不同部分放入不同的列中,并尝试使用匹配模式,agrep()
但我没有得到所需的输出。进一步反省数据,我注意到有前导空格并去掉了它们,仍然不知道为什么我没有得到如上所示的输出。
上表的csv:
"Date","name","expenditure","type"
"23MAR2013","KOSH ENTRP",4000,"COMPANY"
"23MAR2013 ","JOHN DOE",800,"INDIVIDUAL"
"24MAR2013","S KHAN",300,"INDIVIDUAL"
"24MAR2013","JASINT PVT LTD",8000,"COMPANY"
"25MAR2013","KOSH ENTRPRISE",2000,"COMPANY"
"25MAR2013","JOHN S DOE",220,"INDIVIDUAL"
"25MAR2013","S KHAN",300,"INDIVIDUAL"
"26MAR2013","S KHAN",300,"INDIVIDUAL"
并且已计算/标识为的名称向量
NameVector <- c("KOSH","JOHN DOE","KHAN","JASINT")