因此,根据您的示例数据和示例dummy
公式判断,您的目标是确定哪些国家拥有截至 1952 年的完整时间序列(即完整且平衡的面板);如果这不正常,请纠正我。您的示例数据始终满足此条件,因此我将添加一个违反此条件的国家/地区以显示假人正在识别的内容。
clear
input str10 country year poverty_rate Sales
"Austria" 1950 0.54 142
"Austria" 1951 0.32 12441
"Austria" 1952 0.32 12441
"Bangladesh" 1950 0.11 142123123
"Bangladesh" 1951 0.52 1234
"Bangladesh" 1952 0.32 12441
"Sri Lanka" 1950 0.95 4215
"Sri Lanka" 1951 0.21 142421
"Sri Lanka" 1952 0.32 12441
"Canada" 1950 0.95 4215
"Canada" 1951 0.21 .
"Canada" 1952 0.32 12441
end
* TSSET SET ON COUNTRY (after making a country id) AND YEAR
egen country_id = group(country)
tsset country_id year
* Example dummy
gen dummy= 1 if year==1952&(Sales!=.&L1.Sales!=.&L2.Sales!=.)
+-------------------------------------------------------------+
| country year povert~e Sales countr~d dummy |
|-------------------------------------------------------------|
1. | Austria 1950 .54 142 1 . |
2. | Austria 1951 .32 12441 1 . |
3. | Austria 1952 .32 12441 1 1 |
4. | Bangladesh 1950 .11 1.421e+08 2 . |
5. | Bangladesh 1951 .52 1234 2 . |
|-------------------------------------------------------------|
6. | Bangladesh 1952 .32 12441 2 1 |
7. | Canada 1950 .95 4215 3 . |
8. | Canada 1951 .21 . 3 . |
9. | Canada 1952 .32 12441 3 . |
10. | Sri Lanka 1950 .95 4215 4 . |
|-------------------------------------------------------------|
11. | Sri Lanka 1951 .21 142421 4 . |
12. | Sri Lanka 1952 .32 12441 4 1 |
+-------------------------------------------------------------+
. tabdisp country if dummy == 1, c(year)
-----------------------
country | year
-----------+-----------
Austria | 1952
Bangladesh | 1952
Sri Lanka | 1952
-----------------------
由于加拿大缺少 1951 年的销售额,因此没有dummy == 1
.
现在让我们看看当我们想要增加更多年份时会发生什么。除了加拿大,我会给斯里兰卡一个缺失的销售年。总体策略将是跟踪具有非缺失销售的累计年数,直至并包括本年度。让我们首先制作一些示例数据:
* 10 year example Data
clear
set seed 1234
input str10 country
"Austria"
"Bangladesh"
"Sri Lanka"
"Canada"
end
egen country_id = group(country)
expand 10
bysort country: gen year = (1952 - _N ) + _n
gen poverty_rate = runiform(country_id/10, 1)
gen Sales = rnormal(10000 * country_id/10,500)
replace Sales = . if inlist(country, "Canada", "Sri Lanka") & mod(year, country_id + 3) == 0
tsset country_id year
. list
+------------------------------------------------------+
| country countr~d year poverty~e Sales |
|------------------------------------------------------|
1. | Austria 1 1943 .95250845 1247.2748 |
2. | Austria 1 1944 .14700104 868.84461 |
3. | Austria 1 1945 .97688645 1183.0619 |
4. | Austria 1 1946 .95117353 518.76747 |
5. | Austria 1 1947 .26708305 1126.6462 |
|------------------------------------------------------|
6. | Austria 1 1948 .95386004 2142.1245 |
7. | Austria 1 1949 .89428386 1161.1905 |
8. | Austria 1 1950 .94966985 797.04767 |
9. | Austria 1 1951 .18048327 689.44633 |
10. | Austria 1 1952 .77549004 1066.0907 |
|------------------------------------------------------|
11. | Bangladesh 2 1943 .95879865 3018.9243 |
12. | Bangladesh 2 1944 .28973012 2582.4464 |
13. | Bangladesh 2 1945 .58472512 2524.7572 |
14. | Bangladesh 2 1946 .9810758 2164.6962 |
15. | Bangladesh 2 1947 .30039802 2364.3507 |
|------------------------------------------------------|
16. | Bangladesh 2 1948 .81240204 2407.6086 |
17. | Bangladesh 2 1949 .22868748 2441.4124 |
18. | Bangladesh 2 1950 .25618875 1989.3041 |
19. | Bangladesh 2 1951 .36814293 2479.9563 |
20. | Bangladesh 2 1952 .72928052 2302.7052 |
|------------------------------------------------------|
21. | Canada 3 1943 .44079668 2673.9421 |
22. | Canada 3 1944 .9912414 . |
23. | Canada 3 1945 .50897682 1960.373 |
24. | Canada 3 1946 .92788352 3217.6927 |
25. | Canada 3 1947 .35683626 3401.9663 |
|------------------------------------------------------|
26. | Canada 3 1948 .76214979 3976.6976 |
27. | Canada 3 1949 .7398694 2350.2898 |
28. | Canada 3 1950 .31369335 . |
29. | Canada 3 1951 .44475408 3216.2126 |
30. | Canada 3 1952 .8668553 1833.1472 |
|------------------------------------------------------|
31. | Sri Lanka 4 1943 .86803337 3617.7311 |
32. | Sri Lanka 4 1944 .40923508 4508.0392 |
33. | Sri Lanka 4 1945 .7448494 4093.6019 |
34. | Sri Lanka 4 1946 .79308608 . |
35. | Sri Lanka 4 1947 .72141991 4233.8767 |
|------------------------------------------------------|
36. | Sri Lanka 4 1948 .6399412 4404.9189 |
37. | Sri Lanka 4 1949 .6140176 3711.7107 |
38. | Sri Lanka 4 1950 .84002398 3311.1258 |
39. | Sri Lanka 4 1951 .74770728 4021.116 |
40. | Sri Lanka 4 1952 .89887266 4581.6338 |
+------------------------------------------------------+
现在我们将识别缺失的销售额,获取累积总和,并标记 1952 年。
gen year_has_sales = !missing(Sales)
bysort country (year): gen years_with_sales = sum(year_has_sales)
by country: gen dummy = (year == 1952) & years_with_sales == _n
tabdisp country if dummy == 1, c(year)
-----------------------
country | year
-----------+-----------
Austria | 1952
Bangladesh | 1952
-----------------------
正如预期的那样,我们看到奥地利和孟加拉国的dummy == 1
. 请注意,我假设您有一个平衡的面板;可以对上面的代码进行调整,以在一个国家/地区利用min
和max
年份。如果你想检查一个更窄的窗口,比如 5 年,你可以做类似的事情
gen years_with_sales_5 = years_with_sales - L5.years_with_sales
并检查它是否等于 5。