r - Matrix transformation and aggregation in R

Question

I am starting development with R and I am still having "beginner problems" with the language. I would like to do the following:

I have a matrix (data frame:=user) with ~900 columns, each of them is the name of a band (Nirvana, Green Day, Daft-Punk, etc.).
In each row I have an user and the user's music taste (Nirvana = 10, Green Day=5, Daft Punkt=0)
I would like to query another dataframe(:=artists - with the artist's music tags) and substitute the name of the bands by its Genre-Tag (Nirvana --> Rock, Green Day --> Rock, Daft-Punk --> Techno). There are ~120 Tags for music taste (120 < 900)
And finally, I would like to "aggregate" the values over all columns to avoid duplicated columns. In the example from (3) - with the aggregation function "SUM" - the row would have only 2 entries and not 3: (Rock = 15, Techno=0)

Any clues on how to do that with R? Thanks in advance for any help!

Data:

score 2 · Accepted Answer

我有一个包含约 900 列的矩阵（数据框：=用户），每列都是乐队的名称（Nirvana、Green Day、Daft-Punk 等）。
在每一行中，我都有一个用户和用户的音乐品味（Nirvana = 10，Green Day=5，Daft Punkt=0）

这就是所谓的“宽”格式。对于大多数任务来说，最好将其重塑为窄格式，即具有两列的单个 data.frame，一列标识用户，另一列标识波段。有几个工具可以做到这一点，这里有几个关于 SO 的问题。特别寻找重塑标签。

还有一个名为的包reshape可以在这里提供帮助。我所说的过程被称为“融化”数据。

我想查询另一个数据框（：=艺术家 - 带有艺术家的音乐标签）并用其流派标签替换乐队的名称（Nirvana --> Rock, Green Day --> Rock, Daft-Punk -->技术）。有约 120 个音乐品味标签 (120 < 900)

您可以使用merge波段名称作为合并键来组合多个数据框。这就是为什么您希望波段名称是值而不是列名的原因。

最后，我想“聚合”所有列的值以避免重复的列。在 (3) 的示例中 - 使用聚合函数“SUM” - 该行将只有 2 个条目而不是 3 个：(Rock = 15, Techno=0)

当您使用reshape将数据“转换”回宽格式时，您可以提供一个聚合函数，用于组合值。你可以使用sum它。

1 回答 1