stata - “指定的变量太多”错误，预测遵循 logit

Question

我有一组跨越多个国家的数据（公司年）。对于每个国家，我logit使用前五年估计一个模型，然后我使用这个模型来估计predict随后几年的概率。我foreach遍历国家并forvalues遍历随后的几年。

前几个国家运行良好（估计和预测），但第五个国家的第一个样本外预测失败：

Country: United Kingdom
Year: 1994
too many variables specified
r(103);

该模型适合并且 1994 年有足够的数据来预测概率。我的predict电话是：

predict temp_`c'`y' ///
    if (country == "`c'") ///
        & (fyear == `y'), ///
    pr

您有什么想法可能导致此错误吗？我很困惑，因为logit并且predict在同一循环中的其他地方工作。谢谢！

FWIW，这是 .do 文件。

* generate table 5 from Denis and Osobov (2008 JFE)
preserve

* loop to estimate model by country
levelsof country, local(countries)
foreach c of local countries {
    display "Country: `c'"
    summarize fyear if (country == "`c'"), meanonly
    local est_low = `r(min)'
    local est_high = `=`r(min)' + 4'
    local pred_low = `=`r(min)' + 5'
    local pred_high = `r(max)'
    logit payer size v_a_tr e_a_tr re_be_tr ///
        if (country == "`c'") ///
            & inrange(fyear, `est_low', `est_high')
    forvalues y = `pred_low'/`pred_high' {
        display "Country: `c'"
        display "Year: `y'"
        predict temp_`c'`y' ///
            if (country == "`c'") ///
                & (fyear == `y'), ///
            pr
    }
}

* combine fitted values and generate delta
egen payer_expected = rowfirst(temp_*)
drop temp_*
generate delta = payer - payer_expected

* table
table country fyear, ///
    contents(count payer mean payer mean payer_expected)

*
restore

更新：如果 I drop (country == "United Kingdom")，那么同样的问题会转移到美国（面板中的下一个和最后一个国家）。如果我，drop inlist(country, "United Kingdom", "United States")那么问题就会消失并且 .do 文件会运行。

score 2 · Accepted Answer

您正在使用国家名称作为predict正在创建的新变量名称的一部分。但是，当您到达“英国”时，您的线路

predict temp_`c'`y'

意味着类似的东西

predict temp_United Kingdom1812

但 Stata 将其视为两个变量名，其中只允许使用一个。

否则，您会被一条简单的规则所困扰：Stata 不允许在变量名中使用空格。

显然，同样的问题会与“美国”相提并论。

最简单的软糖是更改值，使空格变为下划线“_”。Stata 可以使用包含下划线的变量名。那可能是

gen country2 = subinstr(country, " ", "_", .)

然后是一个循环country2。

请注意所有未了解历史细节的人。1812年是英国军队烧毁白宫的一年。随意替换“1776”或其他选择的日期。

（顺便说一句，感谢一个清晰的问题！）

score 1 · Accepted Answer

这是解决您的问题的另一种方法。初始化变量以保存预测值。然后，当您遍历可能性时，replace它会逐块地处理每组预测。这避免了生成一堆您不想长期保留的具有不同名称的变量的整个业务。

* generate table 5 from Denis and Osobov (2008 JFE)

preserve
gen payer_expected = . 

* loop to estimate model by country
levelsof country, local(countries)
foreach c of local countries {
    display "Country: `c'"
    summarize fyear if (country == "`c'"), meanonly
    local est_low = `r(min)'
    local est_high = `=`r(min)' + 4'
    local pred_low = `=`r(min)' + 5'
    local pred_high = `r(max)'
    logit payer size v_a_tr e_a_tr re_be_tr ///
       if (country == "`c'") ///
       & inrange(fyear, `est_low', `est_high')
    forvalues y = `pred_low'/`pred_high' {
        display "Country: `c'"
        display "Year: `y'"
        predict temp ///
            if (country == "`c'") ///
            & (fyear == `y'), pr
        quietly replace payer_expected = temp if temp < . 
        drop temp 
   }
}

generate delta = payer - payer_expected

* table
table country fyear, ///
     contents(count payer mean payer mean payer_expected)

*
restore

stata - “指定的变量太多”错误，预测遵循 logit

2 回答 2

Related

Reference