0

我有一个数据集,其中每一列都是一个变量,每一行是一个观察值(如时间序列数据。它看起来像这样(我为格式道歉,但我无法显示数据):

在此处输入图像描述

我想知道一个人或一个团体是否随着时间的推移在说同样的事情。我对 n-gram 很熟悉,但这并不是我所需要的。任何帮助,将不胜感激。 在此处输入图像描述

这是我想要的输出:

很抱歉所有的编辑差评;还是习惯了网站。

4

2 回答 2

1

如果您想查看与每个 Person 相关的每个评论的频率以及一个新列 Ready,您可以使用以下代码执行此操作:

set.seed(123456)

### I use the same data as the previous example, thank you for providing this ! 
data <-data.frame(date = Sys.Date() - sample(100),
                Group = c("Cars","Trucks") %>% sample(100,replace=T),
                Reporting_person = c("A","B","C") %>% sample(100,replace=T),
                Comments = c("Awesome","Meh","NC") %>% sample(100,replace=T),
            Ready = as.character(c("Yes","No") %>% sample(100,replace=T))
            ) 

library(dplyr)

data %>% 
    group_by(Reporting_person,Ready) %>%
    count(Comments) %>%
    mutate(prop = prop.table(n))

如果您要查看评论是否随着时间的推移发生更改,并查看该更改是否与事件(如就绪)相关,您可以使用以下代码:

library(dplyr)

### Creating a column comments at time + plus
new = data %>% 
        arrange(Reporting_person,Group,date) %>%
        group_by(Group,Reporting_person) %>%
        mutate(comments_plusone=lag(Comments))

new = na.omit(new)

### Creating the change column   1 is a change , 0 no change

new$Change = as.numeric(new$Comments != new$comments_plusone)

### Get the correlation between Change and the events...

### Chi-test to test if correlation between the event and the change
### Not that using Pearson correlation is not pertinent here : 


tbl <- table(new$Ready,new$Change)

chi2 = chisq.test(tbl, correct=F)
c(chi2$statistic, chi2$p.value)
sqrt(chi2$statistic / sum(tbl))

您应该与此示例没有显着相关性。正如您在说明表格时可以清楚地看到的那样。

plot(tbl)

Chi2

并不是说使用 cor 函数不适合使用两个二进制变量。

这是本主题中的帖子....两个二进制文件之间的相关性

状态变化引起的变化频率

根据您的评论,我添加了以下代码:

newR = data %>% 
        arrange(Reporting_person,Group,date) %>%
        group_by(Group,Reporting_person) %>%
        mutate(Ready_plusone=lag(Ready)) 


newR = na.omit(newR)

###------------------------Add the column to the new data frame
### Creating the REady change column   1 is a change , 0 no change
### Creating the change of state , I use this because you seem to have more than 2 levels.
new$State_change = paste(newR$Ready,newR$Ready_plusone,sep="_")

### Getting the frequency of Change by Change of State(Ready Yes-no..no-yes..)
result <- new %>% 
                group_by(Reporting_person,State_change) %>%
                count(Change) %>%
                mutate(Frequence = prop.table(n))%>%
                filter(Change==1)

 ### Tidyr is a great library for reshape data, you want the wide format of the previous long 
 ### dataframe... However doing this will generate a lot of NA so If I were you I would get 
 ### the result format instead of the following but this could be helpful for future need so here you go.

library(tidyr)

final = as.data.frame(spread(result, key = State_change, value = prop))[,c(1,4:7)]

希望这有帮助:)

于 2017-06-15T16:55:31.143 回答
0

像这样的东西?

df <-data.frame(date = Sys.Date() - sample(10),
                Group = c("Cars","Trucks") %>% sample(10,replace=T),
                Reporting_person = c("A","B","C") %>% sample(10,replace=T),
                Comments = c("Awesome","Meh","NC") %>% sample(10,replace=T))   

#          date  Group Reporting_person Comments
# 1  2017-06-08 Trucks                B  Awesome
# 2  2017-06-05 Trucks                A  Awesome
# 3  2017-06-14   Cars                B      Meh
# 4  2017-06-06   Cars                B  Awesome
# 5  2017-06-11   Cars                A      Meh
# 6  2017-06-07   Cars                B       NC
# 7  2017-06-09   Cars                A       NC
# 8  2017-06-10   Cars                A       NC
# 9  2017-06-13 Trucks                C  Awesome
# 10 2017-06-12 Trucks                B       NC

aggregate(date ~ .,df,length)

#    Group Reporting_person Comments date
# 1 Trucks                A  Awesome    1
# 2   Cars                B  Awesome    1
# 3 Trucks                B  Awesome    1
# 4 Trucks                C  Awesome    1
# 5   Cars                A      Meh    1
# 6   Cars                B      Meh    1
# 7   Cars                A       NC    2
# 8   Cars                B       NC    1
# 9 Trucks                B       NC    1
于 2017-06-15T14:17:52.760 回答