1

我是一名学生,正在使用这个仇恨犯罪数据集进行探索性分析/数据可视化。我试图在 2009 年和 2017 年期间从我的数据集 (hate_crime) 中创建不同类别(即种族、宗教等)的矩阵。完整的数据集可以在这里找到。

我从现有数据中提取了必要的数据(2009 年或 2017 年的事件)。

SecondYear_OTYear <- hate_crime %>% filter(hate_crime$DATA_YEAR == "2017" | hate_crime$DATA_YEAR == "2009")

然后,我只是为类别中的每个子类别制作了不同的子集。例如,为了创建偏见描述的子集,我做了以下操作:

antiWhiteSubset <- SecondYear_OTYear[grep("Anti-White", SecondYear_OTYear$BIAS_DESC), ]
antiWhite17 <- nrow(antiWhiteSubset[antiWhiteSubset$DATA_YEAR == "2017", ])
antiWhite09 <- nrow(antiWhiteSubset[antiWhiteSubset$DATA_YEAR == "2009", ])

antiBlackSubset <- SecondYear_OTYear[grep("Anti-Black", SecondYear_OTYear$BIAS_DESC), ]
antiBlack17 <- nrow(antiBlackSubset[antiBlackSubset$DATA_YEAR == "2017", ])
antiBlack09 <- nrow(antiBlackSubset[antiBlackSubset$DATA_YEAR == "2009", ])

antiLatinoSubset <- SecondYear_OTYear[grep("Anti-Hispanic", SecondYear_OTYear$BIAS_DESC), ]
antiLatino17 <- nrow(antiLatinoSubset[antiLatinoSubset$DATA_YEAR == "2017", ])
antiLatino09 <- nrow(antiLatinoSubset[antiLatinoSubset$DATA_YEAR == "2009", ])

而且,我继续使用相同的结构进行所有不同的偏见描述。然后,我创建了一个总计矩阵来创建不同的条形图、马赛克图或卡方分析,如下所示:

偏见描述的仇恨犯罪事件条形图:

在此处输入图像描述

但是,我觉得有一种更有效的方式来为不同的子集编码......我愿意接受任何建议!太感谢了。

4

2 回答 2

1

您可以使用dplyr过滤数据和ggplot2::geom_bar汇总计数。

hc_small = hate_crimes %>% filter(DATA_YEAR %in% c(2009, 2017))
top_5 = hc_small %>% count(BIAS_DESC, sort=TRUE) %>% pull(BIAS_DESC) %>% head(5)
hc_5 = hc_small %>% filter(BIAS_DESC %in% top_5)

ggplot(hc_5, aes(BIAS_DESC, fill=BIAS_DESC)) + 
  geom_bar() + 
  facet_wrap(~DATA_YEAR) +
  coord_flip() +
  theme_minimal() +
  guides(fill='none')

在此处输入图像描述

于 2020-01-07T20:38:36.413 回答
0

为了像原始问题一样汇总短语,我做了

anti <- 
    hate_crime %>% 
    filter(DATA_YEAR %in% c("2009", "2017")) %>% 
    mutate(
        ANTI_WHITE = grepl("Anti-White", BIAS_DESC),
        ANTI_BLACK = grepl("Anti-Black", BIAS_DESC),
        ANTI_HISPANIC = grepl("Anti-Hispanic", BIAS_DESC)
    ) %>% 
    select(DATA_YEAR, starts_with("ANTI"))

group_by()然后我用and创建了每次出现的计数summarize_all()(注意sum()逻辑向量的 是出现的TRUE次数),并用于pivot_longer()创建一个“整洁”的摘要

anti %>% 
    group_by(DATA_YEAR) %>%
    summarize_all(~ sum(.)) %>%
    tidyr::pivot_longer(starts_with("ANTI"), "BIAS", values_to = "COUNT")

结果类似于(导入数据时出现错误read_csv(),我没有调查)

# A tibble: 6 x 3
  DATA_YEAR BIAS          COUNT
      <dbl> <chr>         <int>
1      2009 ANTI_WHITE      539
2      2009 ANTI_BLACK     2300
3      2009 ANTI_HISPANIC   486
4      2017 ANTI_WHITE      722
5      2017 ANTI_BLACK     2101
6      2017 ANTI_HISPANIC   444

可视化似乎是第二个独立的问题。

代码可以通过定义一个函数来简化一点

n_with_bias <- function(x, bias)
    sum(grepl(bias, x))

然后避免需要单独改变数据

hate_crime %>%
    filter(DATA_YEAR %in% c("2009", "2017")) %>%
    group_by(DATA_YEAR) %>%
    summarize(
        ANTI_WHITE = n_with_bias(BIAS_DESC, "Anti-White"),
        ANTI_BLACK = n_with_bias(BIAS_DESC, "Anti-Black"),
        ANTI_HISPANIC = n_with_bias(BIAS_DESC, "Anti-Hispanic")
    ) %>%
    tidyr::pivot_longer(starts_with("ANTI"), names_to = "BIAS", values_to = "N")

另一方面,基础 R 方法可能会为感兴趣的年份和所有偏差创建向量(strsplit()用于隔离复合偏差的组成部分)

years <- c("2009", "2017")
biases <- unique(unlist(strsplit(hate_crime$BIAS_DESC, ";")))

然后在感兴趣的每一年创建偏差向量

bias_by_year <- split(hate_crime$BIAS_DESC, hate_crime$DATA_YEAR)[years]

并迭代每年和偏差(当元素数量很大(例如,10,000 个)时,嵌套迭代可能效率低下,但这不是问题)

sapply(bias_by_year, function(bias) sapply(biases, n_with_bias, x = bias))

结果是一个经典的data.frame,每年都有所有的偏差

                                                          2009 2017
Anti-Black or African American                            2300 2101
Anti-White                                                 539  722
Anti-Jewish                                                932  983
Anti-Arab                                                    0  106
Anti-Protestant                                             38   42
Anti-Other Religion                                        111   85
Anti-Islamic (Muslim)                                        0    0
Anti-Gay (Male)                                              0    0
Anti-Asian                                                 128  133
Anti-Catholic                                               52   72
Anti-Heterosexual                                           21   33
Anti-Hispanic or Latino                                    486  444
Anti-Other Race/Ethnicity/Ancestry                         296  280
Anti-Multiple Religions, Group                              48   52
Anti-Multiple Races, Group                                 180  202
Anti-Lesbian (Female)                                        0    0
Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)    0    0
Anti-American Indian or Alaska Native                       68  244
Anti-Atheism/Agnosticism                                    10    6
Anti-Bisexual                                               24   24
Anti-Physical Disability                                    24   66
Anti-Mental Disability                                      70   89
Anti-Gender Non-Conforming                                   0   13
Anti-Female                                                  0   48
Anti-Transgender                                             0  117
Anti-Native Hawaiian or Other Pacific Islander               0   15
Anti-Male                                                    0   25
Anti-Jehovah's Witness                                       0    7
Anti-Mormon                                                  0   12
Anti-Buddhist                                                0   15
Anti-Sikh                                                    0   18
Anti-Other Christian                                         0   24
Anti-Hindu                                                   0   10
Anti-Eastern Orthodox (Russian, Greek, Other)                0    0
Unknown (offender's motivation not known)                    0    0

这避免了在summarize()步骤中输入每个偏差的需要。我不确定如何在可读的整洁风格分析中进行计算。

请注意,在上表中,任何带有 a 的偏差(在这两年都为零。这是因为grepl()(偏差视为分组符号;通过添加解决此问题fixed = TRUE

n_with_bias <- function(x, bias)
    sum(grepl(bias, x, fixed = TRUE))

和更新的结果

                                                          2009 2017
Anti-Black or African American                            2300 2101
Anti-White                                                 539  722
Anti-Jewish                                                932  983
Anti-Arab                                                    0  106
Anti-Protestant                                             38   42
Anti-Other Religion                                        111   85
Anti-Islamic (Muslim)                                      107  284
Anti-Gay (Male)                                            688  692
Anti-Asian                                                 128  133
Anti-Catholic                                               52   72
Anti-Heterosexual                                           21   33
Anti-Hispanic or Latino                                    486  444
Anti-Other Race/Ethnicity/Ancestry                         296  280
Anti-Multiple Religions, Group                              48   52
Anti-Multiple Races, Group                                 180  202
Anti-Lesbian (Female)                                      186  133
Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)  311  287
Anti-American Indian or Alaska Native                       68  244
Anti-Atheism/Agnosticism                                    10    6
Anti-Bisexual                                               24   24
Anti-Physical Disability                                    24   66
Anti-Mental Disability                                      70   89
Anti-Gender Non-Conforming                                   0   13
Anti-Female                                                  0   48
Anti-Transgender                                             0  117
Anti-Native Hawaiian or Other Pacific Islander               0   15
Anti-Male                                                    0   25
Anti-Jehovah's Witness                                       0    7
Anti-Mormon                                                  0   12
Anti-Buddhist                                                0   15
Anti-Sikh                                                    0   18
Anti-Other Christian                                         0   24
Anti-Hindu                                                   0   10
Anti-Eastern Orthodox (Russian, Greek, Other)                0   22
Unknown (offender's motivation not known)                    0    0
于 2020-01-07T21:10:55.990 回答