我有一个数据集,其中有一列包含姓名,一列指示该人白天做了什么。我试图找出谁在那天使用 R 在我的数据集中遇到了谁。我创建了一个包含数据集中名称的向量,并在循环中使用 grepl 来识别名称出现在详细说明人们活动的列中的位置在数据集中。
name <- c("Dupont","Dupuy","Smith")
activity <- c("On that day, he had lunch with Dupuy in London.",
"She had lunch with Dupont and then went to Brighton to meet Smith.",
"Smith remembers that he was tired on that day.")
met_with <- c("Dupont","Dupuy","Smith")
df<-data.frame(name, activity, met_with=NA)
for (i in 1:length(met_with)) {
df$met_with<-ifelse(grepl(met_with[i], df$activity), met_with[i], df$met_with)
}
然而,由于两个原因,该解决方案并不令人满意。当此人遇到多个其他人(例如 Dupuy 在我的示例中)时,我无法提取多个名称,并且我不能告诉 R 在使用该名称而不是代词时不要返回该人的姓名活动栏(例如史密斯)。
理想情况下,我希望 df 看起来像:
name activity met_with
Dupont On that day, he had lunch with Dupuy in London. Dupuy
Dupuy She had lunch with Dupont and then (...). Dupont Smith
Smith Smith remembers that he was tired on that day. NA
我正在清理字符串以构建边缘列表和节点列表,以便稍后进行网络分析。
谢谢