r - 将一列中的文本数据转换为R中的数字数据

Question

我的数据框中有一列作为电影的类型，其中有很多。我想将其转换为用于绘制相关矩阵的数值数据。请帮我这样做。

Genre         Genre_numerical
Comedy        1
Action        2
Suspense      3
Comedy        1
Biography     4

score 0 · Accepted Answer

在 R 中，您可以将分类数据作为因子。这是 R 中要做的基本事情（或避免直到最后一刻）。如果您需要更新，请查看无序和无序因素。

您的问题似乎更多地涉及如何关联分类数据的问题。

看看这个答案，然后阅读线程：为因子（分类数据）绘制相关矩阵的等价物？和混合类型？

关联强度通过偏差校正的 Cramer's V 计算名义与名义，使用 Spearman（默认）或 Pearson 相关计算数值与数值，以及使用 ANOVA 计算名义与数值。- @Holger Brandl

score 0 · Accepted Answer

这里有两种解决方案，一种是base R，另一种是基于dplyr：

说明性数据：

set.seed(123)
df <- data.frame(Genre = sample(c("Comedy", "Action", "Suspense", "Biography"), 10, replace = T))

解决方案#1：

Genre您可以使用以下方法为您的类别分配数值ifelse：

df$Genre_numerical <- ifelse(df$Genre == "Comedy", 1,
                            ifelse(df$Genre == "Action", 2,
                                   ifelse(df$Genre == "Suspense", 3, 4)))

解决方案#2：

library(dplyr)
df$Genre_numerical <- df %>% 
  mutate(Genre = case_when(Genre == "Comedy"   ~ 1,
                           Genre == "Action"   ~ 2,
                           Genre == "Suspense" ~ 3, 
                           TRUE                ~ 4))

结果：

无论哪种情况，结果都是相同的：

df
       Genre Genre_numerical
1     Action               2
2  Biography               4
3     Action               2
4  Biography               4
5  Biography               4
6     Comedy               1
7   Suspense               3
8  Biography               4
9   Suspense               3
10    Action               2

r - 将一列中的文本数据转换为R中的数字数据

2 回答 2

Related

Reference