r - R从列表元素创建一个热向量

Question

我正在尝试为输入文件处理一些字符串。首先，我将字符串从向量转换为列表，然后减少为唯一值。

接下来，我想将每个列表元素中的单词转换为分隔符为 ':1' 的字符串。

我可以使该函数在单个列表元素上工作，但是当我尝试使用ldplyfromplyr为整个列表执行此操作时，我只能得到每个列表元素中的最后一个单词。

这是代码：

library(plyr)

df1 <- data.frame(id = seq(1,5,1), string1 = NA)
head(df1)
df1$string1[1] <- "This string is a string."
df1$string1[2] <- "This string is a slightly longer string."
df1$string1[3] <- "This string is an even longer string."
df1$string1[4] <- "This string is a slightly shorter string."
df1$string1[5] <- "This string is the longest string of all the other strings."

df1$string1 <- tolower(as.character(df1$string1))
df1$string1 <- gsub('[[:punct:]]',' ',df1$string1)
df1$string1 <- gsub('[[:digit:]]',' ',df1$string1)
df1$string1 <- gsub("\\s+"," ",df1$string1)

fdList1 <- strsplit(df1$string1, " ", df1$string1)
fdList2 <- lapply(fdList1, unique)

toString1 <- function(x){
string2 <- c()
#print(length(x[1][1]))
#print(x)
#print(class(x))
for(i in length(x)){
string2 <- paste0(string2, x[[i]], ":1 ", collapse="")
}
string2
}

df2 <- ldply(fdList2, toString1)
df2 

v1 <- toString1(fdList2[2])
v1

df2错了，我想要一个类似于v1每个列表元素的向量。

有什么建议么？

score 3 · Accepted Answer

解释为什么会这样：

你的功能toString1是问题：

toString1 <- function(x) {
    string2 <- c()
    for(i in length(x)) { 
        string2 <- paste0(string2, x[[i]], ":1 ", collapse="")
    }
    string2
}

在的情况下toString1(fdList2[1])，您正在传递一个列表。所以，没有用for-loop。如果您的功能是：

toString1 <- function(x) {
    string2 <- paste0(x[[1]], ":1 ", collapse="")
}
o <- toString1(fdList2[2])

# [1] "this:1 string:1 is:1 a:1 slightly:1 longer:1 "

但是当你这样做时ldply，你传递的不是列表（fdList2[2]），而是一个向量（fdList2[[2]]）。因此，在这种情况下，您的功能应该是：

toString1 <- function(x) {
    string2 <- c()
    for(i in 1:length(x)) { 
        string2 <- paste0(string2, x[i], ":1 ", collapse="")
    }
    string2
}
ldply(fdList2, toString1)

#                                                                   V1
# 1                                          this:1 string:1 is:1 a:1 
# 2                      this:1 string:1 is:1 a:1 slightly:1 longer:1 
# 3                         this:1 string:1 is:1 an:1 even:1 longer:1 
# 4                     this:1 string:1 is:1 a:1 slightly:1 shorter:1 
# 5 this:1 string:1 is:1 the:1 longest:1 of:1 all:1 other:1 strings:1

请注意length(x)for 循环中的更改为，1:length(x)因为它必须循环遍历所有元素，并且x[[i]]因为x[i]它是一个向量。

希望这可以帮助。

score 2 · Accepted Answer

为什么不只sapply在“fdList2”上使用？

> sapply(fdList2, paste0, ":1 ", collapse = "")
[1] "this:1 string:1 is:1 a:1 "                                         
[2] "this:1 string:1 is:1 a:1 slightly:1 longer:1 "                     
[3] "this:1 string:1 is:1 an:1 even:1 longer:1 "                        
[4] "this:1 string:1 is:1 a:1 slightly:1 shorter:1 "                    
[5] "this:1 string:1 is:1 the:1 longest:1 of:1 all:1 other:1 strings:1 "
> ## If you need a single column data.frame
> data.frame(V1 = sapply(fdList2, paste0, ":1 ", collapse = ""))
                                                                  V1
1                                          this:1 string:1 is:1 a:1 
2                      this:1 string:1 is:1 a:1 slightly:1 longer:1 
3                         this:1 string:1 is:1 an:1 even:1 longer:1 
4                     this:1 string:1 is:1 a:1 slightly:1 shorter:1 
5 this:1 string:1 is:1 the:1 longest:1 of:1 all:1 other:1 strings:1

就此而言，如果这确实是您的目标，您可以进一步简化中间步骤。跳过“fdList1”和“fdList2”的创建，只需使用：

sapply(strsplit(df1$string1, " "), 
       function(x) paste0(unique(x), ":1 ", collapse = ""))

r - R从列表元素创建一个热向量

2 回答 2

Related

Reference