r - 在回归摘要中隐藏一些系数，同时仍然返回调用、r 平方和其他摘要输出

Question

我的问题与这个问题类似，但是我有兴趣返回所有其他输出，而不仅仅是系数。这是使我的问题更清楚的示例代码。

data=as.data.frame(matrix(rnorm(50*50),50,50))
summary(lm(data[,1]~.-data[,1],data=data))

我只想输出前 5 个系数。我知道我可以用来做到这一点 summary(lm(data[,1]~.-data[,1],data=data))$coeff[1:5,]，但这会摆脱我想要的所有其他输出。我也知道我可以单独获得每个输出，我只想知道是否有一种简洁的方法可以编写一个衬里并删除我不想报告的变量。

score 6 · Accepted Answer

您可以通过对函数稍作修改来选择所需的系数print.summary.lm，该函数是 R 用于输出summary.lm对象汇总结果的内部函数。

首先，获取函数的代码如下：

getAnywhere(print.summary.lm)

然后，我们需要找出系数表的提取位置并将其子集到我们想要的行中。我们将my.rows在函数中添加一个新参数，然后在提取系数表时将这些行作为子集。修改后的函数的代码在这个答案的末尾。

现在，将标准摘要与我们的新摘要进行比较。首先，我将使用真实数据创建一个模型（您提供的模型未正确指定。看起来您的意图是lm(V1 ~ ., data=data)，但即使这样也没有剩余的自由度，所以我想我会用真实数据进行演示放。）：

m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

标准总结：

summary(m1)

Call:
  lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6729 -1.6583 -0.4297  1.3307  5.4688 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.24160    5.48527   6.060 2.11e-06 ***
wt          -2.54332    0.93506  -2.720   0.0115 *  
hp          -0.02589    0.01387  -1.866   0.0733 .  
cyl         -0.40179    0.79364  -0.506   0.6169    
vs           1.17067    1.81283   0.646   0.5241    
am           1.97575    1.64825   1.199   0.2415    
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.537 on 26 degrees of freedom
Multiple R-squared:  0.8514,  Adjusted R-squared:  0.8228 
F-statistic:  29.8 on 5 and 26 DF,  p-value: 5.571e-10

仅包含我们选择的系数的新摘要：

请注意，我们需要首先调用summary模型，因为my.summary.lm期望的是摘要对象，而不是模型对象本身。

my.summary.lm(summary(m1), my.rows=2:4)

您可能更喜欢按名称选择，而不是按索引选择系数：

my.summary.lm(summary(m1), my.rows=grep("wt|hp|cyl", names(coef(m1))))

Call:
  lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.6729 -1.6583 -0.4297  1.3307  5.4688 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)  
wt  -2.54332    0.93506  -2.720   0.0115 *
hp  -0.02589    0.01387  -1.866   0.0733 .
cyl -0.40179    0.79364  -0.506   0.6169  
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.537 on 26 degrees of freedom
Multiple R-squared:  0.8514,  Adjusted R-squared:  0.8228 
F-statistic:  29.8 on 5 and 26 DF,  p-value: 5.571e-10

这是功能。我只对原始函数进行了两次更改，两者都标有内联注释。第一个变化是附加my.rows参数。第二个在开始的行中coefs <-：

my.summary.lm = function (x, digits = max(3L, getOption("digits") - 3L), 
                       symbolic.cor = x$symbolic.cor, 
                       signif.stars = getOption("show.signif.stars"), 
                       my.rows, ...)                     # NOTE NEW my.rows ARGUMENT
{
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), 
      "\n\n", sep = "")
  resid <- x$residuals
  df <- x$df
  rdf <- df[2L]
  cat(if (!is.null(x$weights) && diff(range(x$weights))) 
    "Weighted ", "Residuals:\n", sep = "")
  if (rdf > 5L) {
    nam <- c("Min", "1Q", "Median", "3Q", "Max")
    rq <- if (length(dim(resid)) == 2L) 
      structure(apply(t(resid), 1L, quantile), dimnames = list(nam, 
                                                               dimnames(resid)[[2L]]))
    else {
      zz <- zapsmall(quantile(resid), digits + 1L)
      structure(zz, names = nam)
    }
    print(rq, digits = digits, ...)
  }
  else if (rdf > 0L) {
    print(resid, digits = digits, ...)
  }
  else {
    cat("ALL", df[1L], "residuals are 0: no residual degrees of freedom!")
    cat("\n")
  }
  if (length(x$aliased) == 0L) {
    cat("\nNo Coefficients\n")
  }
  else {
    if (nsingular <- df[3L] - df[1L]) 
      cat("\nCoefficients: (", nsingular, " not defined because of singularities)\n", 
          sep = "")
    else cat("\nCoefficients:\n")
    coefs <- x$coefficients[my.rows,]                      # SUBSET my.rows
    if (!is.null(aliased <- x$aliased) && any(aliased)) {
      cn <- names(aliased)
      coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn, 
                                                              colnames(coefs)))
      coefs[!aliased, ] <- x$coefficients
    }
    printCoefmat(coefs, digits = digits, signif.stars = signif.stars, 
                 na.print = "NA", ...)
  }
  cat("\nResidual standard error:", format(signif(x$sigma, 
                                                  digits)), "on", rdf, "degrees of freedom")
  cat("\n")
  if (nzchar(mess <- naprint(x$na.action))) 
    cat("  (", mess, ")\n", sep = "")
  if (!is.null(x$fstatistic)) {
    cat("Multiple R-squared: ", formatC(x$r.squared, digits = digits))
    cat(",\tAdjusted R-squared: ", formatC(x$adj.r.squared, 
                                           digits = digits), "\nF-statistic:", formatC(x$fstatistic[1L], 
                                                                                       digits = digits), "on", x$fstatistic[2L], "and", 
        x$fstatistic[3L], "DF,  p-value:", format.pval(pf(x$fstatistic[1L], 
                                                          x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE), 
                                                       digits = digits))
    cat("\n")
  }
  correl <- x$correlation
  if (!is.null(correl)) {
    p <- NCOL(correl)
    if (p > 1L) {
      cat("\nCorrelation of Coefficients:\n")
      if (is.logical(symbolic.cor) && symbolic.cor) {
        print(symnum(correl, abbr.colnames = NULL))
      }
      else {
        correl <- format(round(correl, 2), nsmall = 2, 
                         digits = digits)
        correl[!lower.tri(correl)] <- ""
        print(correl[-1, -p, drop = FALSE], quote = FALSE)
      }
    }
  }
  cat("\n")
  invisible(x)
}

更新：正如我在评论中提到的，您可以滚动自己的摘要函数并将其设置为与您经常使用的任何类型的模型摘要对象一起使用。在这种情况下，我们将包括summary.lm和summary.plm对象，它们分别是在您运行时创建的对象类型summary和lm模型plm对象。

首先，我们需要使用 lm 和 plm 模型对象：

# lm object
m1 = lm(mpg ~ wt + hp + cyl + vs + am, data=mtcars)

# plm object
library(plm)

# Example from plm help
data("Produc", package = "plm")
zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
          data = Produc, index = c("state","year"))

现在我们需要一个函数来输出我们想要的汇总数据。为了创建下面的代码，我只查看了由summary(m1)and创建的列表对象中的内容summary(zz)（执行str(summary(m1))并str(summary(zz))查看这些），因此我知道从哪里获取摘要对象的各种元素（例如 thecall和 the residuals）。在某些情况下，我也只是直接从print.summary.lmandprint.summary.plm函数中复制了代码部分。

下面的函数不会打印本机摘要函数中包含的所有内容，但应该足以向您展示如何在输出中添加您想要的任何元素。

# Summary function that allows selection of which coefficients to include 
# in the coefficient table
# Works with summary.lm and summary.plm objects
my.summary = function(x, rows, digits=3) {

  # Print a few summary elements that are common to both lm and plm model summary objects
  cat("Call\n")
  print(x$call)
  cat("\nResiduals\n")
  print(summary(x$residuals))
  cat("\n")
  print(coef(x)[rows,])

  # Print elements unique to lm model summary objects
  if("summary.lm" %in% class(x)) {
    cat("\nResidual standard error:", round(x$sigma,3), "on", x$df[2], "degrees of freedom")
    cat(paste(c("\nF-statistic:", " on"," and"), round(x$fstatistic,2), collapse=""),
        "DF, p-value:",
        format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L], 
                       lower.tail = FALSE), digits=digits))

  # Print elements unique to plm model summary objects  
  } else if ("summary.plm" %in% class(x)) {
    cat(paste("\nResidual Sum of Squares: ", signif(deviance(x), 
                                                  digits), "\n", sep = ""))
    fstat <- x$fstatistic
    if (names(fstat$statistic) == "F") {
      cat(paste("F-statistic: ", signif(fstat$statistic), " on ", 
                fstat$parameter["df1"], " and ", fstat$parameter["df2"], 
                " DF, p-value: ", format.pval(fstat$p.value, digits = digits), 
                "\n", sep = ""))
    }
    else {
      cat(paste("Chisq: ", signif(fstat$statistic), " on ", 
                fstat$parameter, " DF, p-value: ", format.pval(fstat$p.value, 
                                                               digits = digits), "\n", sep = ""))
    }
  }
}

lm现在在模型和模型上运行该函数plm：

my.summary(summary(m1), 2:4)

Call
lm(formula = mpg ~ wt + hp + cyl + vs + am, data = mtcars)

Residuals
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-3.6730 -1.6580 -0.4297  0.0000  1.3310  5.4690 

       Estimate Std. Error    t value   Pr(>|t|)
wt  -2.54331718 0.93506164 -2.7199460 0.01148231
hp  -0.02588661 0.01387176 -1.8661377 0.07334148
cyl -0.40178727 0.79364098 -0.5062582 0.61694148

Residual standard error: 2.537 on 26 degrees of freedom
F-statistic: 29.8 on 5 and 26 DF, p-value: 5.57e-10

my.summary(summary(zz), 2:3)

Call
plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
    data = Produc, index = c("state", "year"))

Residuals
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.120500 -0.023740 -0.002041  0.000000  0.018140  0.174700 

          Estimate Std. Error  t-value      Pr(>|t|)
log(pc)  0.2920069 0.02511967 11.62463  7.075069e-29
log(emp) 0.7681595 0.03009174 25.52725 2.021455e-104

Residual Sum of Squares: 1.11
F-statistic: 3064.81 on 4 and 764 DF, p-value: <2e-16

我想如果你真的想一路走下去，你可以利用面向对象的优势，为你想要包含的每种模型编写你自己的通用函数。

r - 在回归摘要中隐藏一些系数，同时仍然返回调用、r 平方和其他摘要输出

1 回答 1

Related

Reference