这是我的数据的样子
UUID Source
1 Jane http//mywebsite.com44bb00?utm_source=ADW&utm_medium=banner&utm_campaign=Monk&gclid1234
2 Mike http//mywebsite.com44bb00?utm_source=Google&utm_medium=cpc&utm_campaign=DOG&gclid1234
3 John http//mywebsite.com44bb00?utm_source=Yahoo&utm_medium=banner&utm_campaign=DOG&gclid1234
4 Sarah http//mywebsite.com44bb00?utm_source=Facebookdw&utm_medium=cpc&utm_campaign=CAT&gclid1234
5 Michael http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=GDNr&utm_campaign=CAT&gclid1234
6 Bob http//mywebsite.com44bb00?utm_source=ADW&utm_medium=GDN&utm_campaign=DOG&gclid1234
7 Mark http//mywebsite.com44bb00?utm_source=Twitter&utm_medium=banner&utm_campaign=MONK&gclid1234
8 Anna http//mywebsite.com44bb00?utm_source=Facebook&utm_medium=banner&utm_campaign=MONK&gclid1234
这是我想要实现的期望输出
NAME UTM_SOURCE UTM_MEDIUM UTM_CAMPAIGN
1 Jane ADW banner Monk
2 Mike Google cpc DOG
3 John Yahoo banner DOG
4 Sarah Faceboo cpc CAT
5 Michael Twitter GDN CAT
6 Bob ADW GDN DOG
7 Mark Twitter banner MONK
8 Anna Facebook banner MONK
所以换句话说,我想要的是根据标准获取特定的信息。示例:在数据框中搜索值“utmsource=”,找到后复制在“=”和“&”符号之间找到的任何信息。在用户 no1 (Jame) 的情况下,如果您查看原始文件,她的源 URL 包含值“utm_source=ADW”。在输出文件中,“ADW”位被提取并归入一个名为“utm_source”的新列中。所有其他用户和其他维度的相同原则(utm_medium 和 utm_campaign)
我知道该功能gsub
可以帮助我。这是我到目前为止所尝试的:
> file1 <- read.csv("C:/Users/Dumitru Ostaciu/Desktop/Users.csv")
> file1 <- transform(file1, Source = as.character(Source))
> file2 <- gsub(".*\\?utm_source=", "", file1$Source)
这就是我得到的结果
UUID SOURCE
1 ADW&utm_medium=banner&utm_campaign=Monk&gclid1234
2 Google&utm_medium=cpc&utm_campaign=DOG&gclid1234
3 Yahoo&utm_medium=banner&utm_campaign=DOG&gclid1234
4 Facebookdw&utm_medium=cpc&utm_campaign=CAT&gclid1234
5 Twitter&utm_medium=GDNr&utm_campaign=CAT&gclid1234
6 ADW&utm_medium=GDN&utm_campaign=DOG&gclid1234
7 Twitter&utm_medium=banner&utm_campaign=MONK&gclid1234
8 Facebook&utm_medium=banner&utm_campaign=MONK&gclid1234
我对此有两个问题:
1) 在我得到的输出中,该函数复制了值 "utm_source-" 后面的所有内容。如何添加另一个维度以使公式仅复制“=”和“&”之间的内容
2) 我如何保留最初在第一列 (UUID)、Jane、Mike、John 等中的值?