r - 根据 P 值绘制滚动系数和颜色

Question

这有点棘手！我正在运行回归的滚动窗口，并且正在收集每个窗口的所有系数。我的目标是绘制系数如何随时间波动。此外，我希望通过在不显着时给出不同的色点，当发现系数具有统计显着性（比如在 95% 时）时，该图给出不同的颜色。

到目前为止，我所拥有的是：

library(plm)
coeff<-NULL
for(e in 1:39){   #44 years total for each country
      paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame


coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
      coeff<-c(coeff,coef)  #store coeffs
    } 
plot(coeff,type="b",col="red")

情节产生：

例如说第二个和第四个系数（图中的项目符号）在统计上是不显着的；那么它们的颜色应该是绿色的。

Data (LaggedPannel)：

                 Age1     Age2     Age3
Australia-1973  261.156  255.699  249.954
Australia-1974  261.305  255.394  251.470
Australia-1975  258.160  253.543  250.538
Australia-1976  262.504  258.066  254.720
Australia-1977  240.086  260.846  258.418
Australia-1978  228.774  238.871  259.449
USA-1973       4100.257 4104.028 4107.409
USA-1974       4135.435 4118.422 4120.286
USA-1975       4171.648 4164.065 4134.525
USA-1976       4208.236 4187.196 4171.167
USA-1977       4240.832 4211.655 4189.650
USA-1978       4286.923 4255.092 4229.701

score 1 · Accepted Answer

使用一个额外的向量来存储 p 值，然后根据它们的值与 0.05 显着性水平相比进行着色也解决了这个问题。具体来说：

library(plm)
coeff<-NULL
P_values<-NULL
for(e in 1:39){   #44 years total for each country
      paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame


coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
PV<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,4] #stores the p-values
coeff<-c(coeff,coef)
P_values<-c(P_values,PV)
    } 
plot(coeff,type="b",col="red") #previousplot

 plot(coeff,col=ifelse(P_values<=0.05, "blue", "red"),ylab = "coef",type="b") 
    #new plot based on significant values:

这个答案的唯一问题是，如果您要考虑多个变量，那将非常乏味；那么您将需要创建多个空向量等等。这不是一种快速的方法，但可以肯定它有效。

score 1 · Accepted Answer

这是一些模拟数据。

library(tidyverse)
library(broom)
simfun <- function(a=0.1,B=0.05,n=200,x.sd=1,e.sd=1) {
  x <- rnorm(n, mean=0, sd=x.sd) + runif(100)
  e <-  rnorm(n, mean=0, sd=e.sd)
  y <- a+B*x+e 
  data.frame(x,y)
}

statfun <- function(d) {
  summary(lm(y~x,data=d)) %>% tidy()
}

simdata <- map(seq(50),~statfun(simfun())) %>% enframe() %>% unnest() %>% filter(term == "x")

下面确定哪些系数是“重要的”。

simdata <- simdata %>% 
  mutate(row_number(),
         Significance = factor(p.value < 0.05))

如果你想使用 base plot，你可以这样做：

Significance = simdata$Significance

plot(simdata$estimate, col = ifelse(Significance==TRUE, "blue", "red"), ylab = "coeff")
lines(simdata$estimate)

或使用ggplot2，您可以执行以下操作：

ggplot(simdata, aes(name, estimate)) + geom_line() + geom_point(aes(color = Significance), shape = 1) +
  labs(x = "Index", y = "coeff") + theme_bw()

r - 根据 P 值绘制滚动系数和颜色

2 回答 2

Related

Reference