0

所以说我有一个这样的数据集:

国家 贫困率 销售量
奥地利 1950 0.54 142
奥地利 1951年 0.32 12441
奥地利 1952年 0.32 12441
孟加拉国 1950 0.11 142123123
孟加拉国 1951年 0.52 1234
孟加拉国 1952年 0.32 12441
斯里兰卡 1950 0.95 4215
斯里兰卡 1951年 0.21 142421
斯里兰卡 1952年 0.32 12441

我有这样的代码: gen dummy=1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)

只有3年很容易,但说还有更多。而不是复制粘贴 L#.Sales!=。对于每一年,有什么更快的方法来做到这一点?

4

1 回答 1

0

因此,根据您的示例数据和示例dummy公式判断,您的目标是确定哪些国家拥有截至 1952 年的完整时间序列(即完整且平衡的面板);如果这不正常,请纠正我。您的示例数据始终满足此条件,因此我将添加一个违反此条件的国家/地区以显示假人正在识别的内容。

clear 
input str10 country year poverty_rate Sales
"Austria"   1950    0.54    142
"Austria"   1951    0.32    12441
"Austria"   1952    0.32    12441
"Bangladesh"    1950    0.11    142123123
"Bangladesh"    1951    0.52    1234
"Bangladesh"    1952    0.32    12441
"Sri Lanka" 1950    0.95    4215
"Sri Lanka" 1951    0.21    142421
"Sri Lanka" 1952    0.32    12441
"Canada"    1950    0.95    4215
"Canada"    1951    0.21    .
"Canada"    1952    0.32    12441 
end

* TSSET SET ON COUNTRY (after making a country id) AND YEAR
egen country_id = group(country)
tsset country_id year

* Example dummy
gen dummy= 1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)

     +-------------------------------------------------------------+
     |    country   year   povert~e       Sales   countr~d   dummy |
     |-------------------------------------------------------------|
  1. |    Austria   1950        .54         142          1       . |
  2. |    Austria   1951        .32       12441          1       . |
  3. |    Austria   1952        .32       12441          1       1 |
  4. | Bangladesh   1950        .11   1.421e+08          2       . |
  5. | Bangladesh   1951        .52        1234          2       . |
     |-------------------------------------------------------------|
  6. | Bangladesh   1952        .32       12441          2       1 |
  7. |     Canada   1950        .95        4215          3       . |
  8. |     Canada   1951        .21           .          3       . |
  9. |     Canada   1952        .32       12441          3       . |
 10. |  Sri Lanka   1950        .95        4215          4       . |
     |-------------------------------------------------------------|
 11. |  Sri Lanka   1951        .21      142421          4       . |
 12. |  Sri Lanka   1952        .32       12441          4       1 |
     +-------------------------------------------------------------+

. tabdisp country if dummy == 1, c(year)

-----------------------
   country |       year
-----------+-----------
   Austria |       1952
Bangladesh |       1952
 Sri Lanka |       1952
-----------------------

由于加拿大缺少 1951 年的销售额,因此没有dummy == 1.

现在让我们看看当我们想要增加更多年份时会发生什么。除了加拿大,我会给斯里兰卡一个缺失的销售年。总体策略将是跟踪具有非缺失销售的累计年数,直至并包括本年度。让我们首先制作一些示例数据:

* 10 year example Data
clear 
set seed 1234
input str10 country
"Austria"   
"Bangladesh"    
"Sri Lanka"
"Canada"    
end
egen country_id = group(country)

expand 10
bysort country: gen year = (1952 - _N ) + _n 
gen poverty_rate = runiform(country_id/10, 1)
gen Sales = rnormal(10000 * country_id/10,500)
replace Sales = . if inlist(country, "Canada", "Sri Lanka") & mod(year, country_id + 3) == 0
tsset country_id year

. list

     +------------------------------------------------------+
     |    country   countr~d   year   poverty~e       Sales |
     |------------------------------------------------------|
  1. |    Austria          1   1943   .95250845   1247.2748 |
  2. |    Austria          1   1944   .14700104   868.84461 |
  3. |    Austria          1   1945   .97688645   1183.0619 |
  4. |    Austria          1   1946   .95117353   518.76747 |
  5. |    Austria          1   1947   .26708305   1126.6462 |
     |------------------------------------------------------|
  6. |    Austria          1   1948   .95386004   2142.1245 |
  7. |    Austria          1   1949   .89428386   1161.1905 |
  8. |    Austria          1   1950   .94966985   797.04767 |
  9. |    Austria          1   1951   .18048327   689.44633 |
 10. |    Austria          1   1952   .77549004   1066.0907 |
     |------------------------------------------------------|
 11. | Bangladesh          2   1943   .95879865   3018.9243 |
 12. | Bangladesh          2   1944   .28973012   2582.4464 |
 13. | Bangladesh          2   1945   .58472512   2524.7572 |
 14. | Bangladesh          2   1946    .9810758   2164.6962 |
 15. | Bangladesh          2   1947   .30039802   2364.3507 |
     |------------------------------------------------------|
 16. | Bangladesh          2   1948   .81240204   2407.6086 |
 17. | Bangladesh          2   1949   .22868748   2441.4124 |
 18. | Bangladesh          2   1950   .25618875   1989.3041 |
 19. | Bangladesh          2   1951   .36814293   2479.9563 |
 20. | Bangladesh          2   1952   .72928052   2302.7052 |
     |------------------------------------------------------|
 21. |     Canada          3   1943   .44079668   2673.9421 |
 22. |     Canada          3   1944    .9912414           . |
 23. |     Canada          3   1945   .50897682    1960.373 |
 24. |     Canada          3   1946   .92788352   3217.6927 |
 25. |     Canada          3   1947   .35683626   3401.9663 |
     |------------------------------------------------------|
 26. |     Canada          3   1948   .76214979   3976.6976 |
 27. |     Canada          3   1949    .7398694   2350.2898 |
 28. |     Canada          3   1950   .31369335           . |
 29. |     Canada          3   1951   .44475408   3216.2126 |
 30. |     Canada          3   1952    .8668553   1833.1472 |
     |------------------------------------------------------|
 31. |  Sri Lanka          4   1943   .86803337   3617.7311 |
 32. |  Sri Lanka          4   1944   .40923508   4508.0392 |
 33. |  Sri Lanka          4   1945    .7448494   4093.6019 |
 34. |  Sri Lanka          4   1946   .79308608           . |
 35. |  Sri Lanka          4   1947   .72141991   4233.8767 |
     |------------------------------------------------------|
 36. |  Sri Lanka          4   1948    .6399412   4404.9189 |
 37. |  Sri Lanka          4   1949    .6140176   3711.7107 |
 38. |  Sri Lanka          4   1950   .84002398   3311.1258 |
 39. |  Sri Lanka          4   1951   .74770728    4021.116 |
 40. |  Sri Lanka          4   1952   .89887266   4581.6338 |
     +------------------------------------------------------+

现在我们将识别缺失的销售额,获取累积总和,并标记 1952 年。

gen year_has_sales = !missing(Sales)
bysort country (year): gen years_with_sales = sum(year_has_sales)

by country: gen dummy = (year == 1952) & years_with_sales == _n
tabdisp country if dummy == 1, c(year)


-----------------------
   country |       year
-----------+-----------
   Austria |       1952
Bangladesh |       1952
-----------------------

正如预期的那样,我们看到奥地利和孟加拉国的dummy == 1. 请注意,我假设您有一个平衡的面板;可以对上面的代码进行调整,以在一个国家/地区利用minmax年份。如果你想检查一个更窄的窗口,比如 5 年,你可以做类似的事情

gen years_with_sales_5 = years_with_sales - L5.years_with_sales

并检查它是否等于 5。

于 2022-01-19T19:58:21.350 回答