1

我有一个数据框,包括每个单元格的值(可能不止一个值)以及行和列索引。


df = data.frame(values = c(1,"Sven", 20,"Mueller","sept",2,30,"John","Mar","Hynes","Marc"), 
                colI = c(1,2,3,2,4,1,3,2,4,2,2), rowI = c(1,1,1,1,1,2,2,2,2,2,2))

我想得到类似于以下data.frame的东西:

df_final= data.frame(Index = c(1,2), name = c("Sven, Mueller", "John, Hynes, Marc"), age = c(20,30), 
                     month = c("sept","Mar"))

但是,我什么也做不了,我也没有在网上找到解决方案。我无法找到将值带到数据框中相应位置的解决方案,我认为单元格可以包含不同数量的值是一个更大的挑战。

谢谢你的帮助。

4

3 回答 3

1

使用 base R,您可以首先aggregate将数据转换为每个行和列索引的逗号分隔字符串,然后使用unstack.

temp <- aggregate(values~colI + rowI, df, toString)
unstack(temp, values~colI)

#  X1                X2 X3   X4
#1  1     Sven, Mueller 20 sept
#2  2 John, Hynes, Marc 30  Mar

数据

df <- structure(list(values = c("1", "Sven", "20", "Mueller", "sept", 
"2", "30", "John", "Mar", "Hynes", "Marc"), colI = c(1, 2, 3, 
2, 4, 1, 3, 2, 4, 2, 2), rowI = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 2)), class = "data.frame", row.names = c(NA, -11L))
于 2020-08-04T09:57:03.483 回答
0

您可以使用dplyr,tidyrstringr, 全部包含在tidyverse:

df %>%
# bring your data into a wider format
  pivot_wider(id_cols=rowI, names_from=colI, values_from=values, values_fn=list) %>% 
# remove the nested listing
  unnest(everything()) %>%
# rename the columns
  select(Index = rowI, name=`2`, age=`3`, month=`4`) %>%
# group all rows based on the index
  group_by(Index) %>%
# concatenate the name column
  mutate(name=str_c(name, collapse=", ")) %>%
# remove duplicates
  distinct()

返回

# A tibble: 2 x 4
# Groups:   Index [2]
  Index name              age   month
  <dbl> <chr>             <chr> <chr>
1     1 Sven, Mueller     20    sept 
2     2 John, Hynes, Marc 30    Mar 

注意:我稍微更改了您的输入数据,并2在您的rowI列中添加了一个(请参阅 Maurits Evers 的评论)。

于 2020-08-04T09:56:17.390 回答
0

另一种解决方案

df %>% 
  pivot_wider(rowI, names_from = colI, values_from = values, values_fn = toString) %>% 
  select(-rowI) %>% 
  purrr::set_names(c("ID", "name", "age", "month"))
于 2020-08-04T13:58:22.737 回答