r - 将矩阵行分配给单个数据框列

Question

我有一个矩阵tf.mNxM 和dfN 行的数据框。
我想n将矩阵的行分配给数据框中的列，在同一行n。

library("tm")
ftfidf <- function(text.d) {
  txt <- VectorSource(text.d);
  txt.corpus <- VCorpus(txt, readerControl = list(reader = readPlain,    language = "en"));
 revs <- tm_map(txt.corpus, content_transformer(tolower)) 
 dtm <- DocumentTermMatrix(revs, control = list(weighting = function(x)   weightTfIdf(x, normalize = T),stopwords = TRUE))
}

df<-data.frame(id=c("doc1", "doc2", "doc3"), text=c("hello world", "people people", "happy people"))
#id          text
#1 doc1   hello world
#2 doc2 people people
#3 doc3  happy people
tf <- ftfidf(df$text) # a function that gets a DocumentTermMatrix
tf.m <- as.matrix(tf)
#Terms
#Docs     happy     hello    people     world
#1 0.0000000 0.7924813 0.0000000 0.7924813
#2 0.0000000 0.0000000 0.5849625 0.0000000
#3 0.7924813 0.0000000 0.2924813 0.0000000

如果我运行它，我会在数据框中再获得 4 列

df$tf<-tf.m
#id          text  tf.happy  tf.hello tf.people  tf.world
#1 doc1   hello world 0.0000000 0.7924813 0.0000000 0.7924813
#2 doc2 people people 0.0000000 0.0000000 0.5849625 0.0000000
#3 doc3  happy people 0.7924813 0.0000000 0.2924813 0.0000000

我想要这个：

#id          text       tf
#1 doc1   hello world   happy     hello    people     world
#                       0.0000000 0.7924813 0.0000000 0.7924813
#2 doc2 people people   happy     hello    people     world
#                       0.0000000 0.0000000 0.5849625 0.0000000
#2 doc3 happy people   happy     hello    people     world
#                       0.7924813 0.0000000 0.2924813 0.0000000

尝试根据词频训练 knn df$tf（如果可能）

 knn_model <- knn(train = df$tf[1,], cl = df$id, k=3)

查询 a 的最近邻居df$id。
我的目标是在 R 中运行这个“喜欢”的 python graphlab 函数：

knn_model = graphlab.nearest_neighbors.create(df,features=['tf'],label='id')

score 0 · Accepted Answer

看起来你想要有分层索引。据我所知，在 R 中没有明确的方法可以做到这一点。 Data.table 允许分配键，但不是真正的索引，因为它们是数据的一部分，与元数据（索引）和数据的 python pandas 相比解耦。我从表达式中假设这一点，df$tf[1,]如果 df 是 data.frame，它应该会在维度上引发错误。

我从 R 中获得的经验是，在大多数情况下，预计这样的数据会以长格式表示，即。

id   text          tf    value
doc1 hello world  happy  0.0000000
doc1 hello world  hello  0.7924813
doc1 hello world  people 0.0000000
doc1 hello world  world  0.7924813

这可以通过各种包装中的融化功能来实现。有时您只需要一个变量和一个值列。在这种情况下，交互函数有助于组合变量。

希望这会有所帮助，并且我理解您的问题，渴望自己找出 R 中是否存在真正的索引。

r - 将矩阵行分配给单个数据框列

1 回答 1

Related

Reference