r - as.data.frame.default（数据）中的错误无法将类“公式”强制转换为 data.frame

Question

我的代码如下所示：

    get_postcoefs <- function(portfo){
      my_dat <- prerank_betas %>%
        filter(portfo == portfo) %>%
        lm(ret ~ ewr, my_dat) %>% 
        coef %>% 
        as.list %>% 
        as_data_frame
}

当我想在下一步中使用此代码应用此功能时

postrank <- prerank_betas %>%
  group_by(portfo) %>%
  do(get_postcoefs(.$portfo))

我使用的数据框如下所示：

dput(head(prerank_betas, 10))

structure(list(permco = c(3, 4, 5, 6, 7, 8, 9, 11, 12, 13), pre_beta = c(0.754759259550561, 
    0.631020855428056, 0.963497668377108, 1.42359914669436, 1.88321141160762, 
    0.137054776055511, 1.04141132820461, 0.170163365604386, 1.07633721793778, 
    1.05016503010496), ret = c(0.021630734879652, 0.00867405735757635, 
    0.0157192335910029, 0.0163030885650139, 0.017402600558639, 0.0182427638210356, 
    0.015755719798324, 0.0348026989282579, 0.0120230854319578, 0.016944221076395
    ), me = c(12.3938081896552, 603.599033139535, 36.6372490671642, 
    20.481490497076, 2918.12852836134, 1.89075555555556, 1.21730113636364, 
    5.5216014957265, 116.021340472028, 8.22907327586207), ewr = c(0.454914743929347, 
    0.65175605642766, 1.04015768854358, 1.54966348955938, 1.46542203513179, 
    0.874404877119168, 0.934768449855933, 0.296266764535612, 0.949971716508229, 
    1.31022003302531), beta_rank = c(3L, 3L, 5L, 8L, 10L, 1L, 6L, 
    1L, 6L, 6L), portfo = c(4L, 10L, 6L, 5L, 10L, 1L, 1L, 2L, 8L, 
    3L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
    ))

小标题：10 x 7

   permco pre_beta     ret      me   ewr beta_rank portfo
    <dbl>    <dbl>   <dbl>   <dbl> <dbl>     <int>  <int>
 1      3    0.755 0.0216    12.4  0.455         3      4
 2      4    0.631 0.00867  604.   0.652         3     10
 3      5    0.963 0.0157    36.6  1.04          5      6
 4      6    1.42  0.0163    20.5  1.55          8      5
 5      7    1.88  0.0174  2918.   1.47         10     10
 6      8    0.137 0.0182     1.89 0.874         1      1
 7      9    1.04  0.0158     1.22 0.935         6      1
 8     11    0.170 0.0348     5.52 0.296         1      2
 9     12    1.08  0.0120   116.   0.950         6      8
10     13    1.05  0.0169     8.23 1.31          6      3

我收到以下错误消息：

 Error in as.data.frame.default(data) : 
  cannot coerce class ‘&quot;formula"’ to a data.frame

我必须如何调整我的代码才能正常工作。

score 0 · Accepted Answer

从 OP 代码看来，问题是尝试lm()按投资组合运行并为所有投资组合回归创建系数的输出数据框。

如评论中所述，原始帖子中的代码失败，因为当 R 尝试处理表达式时，该filter()函数包含一个quasiquotationportfo = portfo冲突。

没有一个最小的可重现示例，这是一种在数据帧purrr::map()上broom::tidy()运行线性模型的方法。mtcars

由于我们将数据拆分为mtcars$cyl，因此不需要filter()OP 中使用的函数。

library(dplyr)
library(purrr)
library(broom)
mtcars %>% 
     split(.$cyl) %>% 
     purrr::map(.,function(x){
          lm(mpg ~ wt, data = x) %>%
               tidy(.)
     }) -> results

# combine into a data frame
df <- as.data.frame(do.call(rbind,results))
# extract cyl from rownames 
df$cyl <- substr(rownames(df),1,1)

...和输出：

           term  estimate std.error statistic      p.value cyl
4.1 (Intercept) 39.571196 4.3465820  9.103980 7.771511e-06   4
4.2          wt -5.647025 1.8501185 -3.052251 1.374278e-02   4
6.1 (Intercept) 28.408845 4.1843688  6.789278 1.054844e-03   6
6.2          wt -2.780106 1.3349173 -2.082605 9.175766e-02   6
8.1 (Intercept) 23.868029 3.0054619  7.941551 4.052705e-06   8
8.2          wt -2.192438 0.7392393 -2.965803 1.179281e-02   8
>

使用原始海报数据的解决方案

在将最近发布的数据修改为每个值至少有 5 个观察portfo值之后，处理股票数据的解决方案如下所示。

textData <- "id permco pre_beta     ret      me   ewr beta_rank portfo
1      3    0.755 0.0216    12.4  0.455         3      1
2      4    0.631 0.00867  604.   0.652         3      1
3      5    0.963 0.0157    36.6  1.04          5      1
4      6    1.42  0.0163    20.5  1.55          8      1
5      7    1.88  0.0174  2918.   1.47         10      1
6      8    0.137 0.0182     1.89 0.874         1      1
7      3    0.755 0.0216    12.4  0.455         3      2
8      4    0.631 0.00867  604.   0.652         3      2
9      5    0.963 0.0157    36.6  1.04          5      2
10     6    1.42  0.0163    20.5  1.55          8      2
11     7    1.88  0.0174  2918.   1.47         10      2
12     8    0.137 0.0182     1.89 0.874         1      2"

注意：通过复制值和调整portfo标识符，我们可以演示解决方案，因为它们具有相同的输入数据，因此生成的两个模型将具有完全相同的系数。

prerank_betas <- read.table(text=textData,header=TRUE)
library(dplyr)
library(purrr)
library(broom)
prerank_betas %>% 
        split(.$portfo) %>% 
        purrr::map(.,function(x){
                lm(ret ~ ewr, data = x) %>%
                        tidy(.)
        }) -> results

# combine into a data frame
df <- as.data.frame(do.call(rbind,results))
df$portfo <- as.numeric(gsub(".$","",rownames(df)))
df

...和输出：

           term     estimate   std.error   statistic    p.value portfo
1.1 (Intercept) 1.629081e-02 0.005291043 3.078941068 0.03696967      1
1.2         ewr 2.071675e-05 0.004884269 0.004241526 0.99681887      1
2.1 (Intercept) 1.629081e-02 0.005291043 3.078941068 0.03696967      2
2.2         ewr 2.071675e-05 0.004884269 0.004241526 0.99681887      2

r - as.data.frame.default（数据）中的错误无法将类“公式”强制转换为 data.frame

小标题：10 x 7

1 回答 1

使用原始海报数据的解决方案

Related

Reference