作为 R 用户,我现在正在使用此merge
资源学习 Stata,并且对命令感到困惑。
在 R 中,我不必担心错误地合并数据,因为它无论如何都会合并所有内容。如果公共列包含任何重复项,我无需担心,因为Y
数据框将合并到数据框中的每个重复行X
。(all=FALSE
在 中使用merge
)
X
但是对于 Stata,我需要在继续合并之前删除重复的行。
在 Stata 中是否假设,为了merge
继续,主表中的公共列必须是唯一的?
The answer to your question is No. I will try to explain why.
The link you mention covers only one type of merge that is possible with Stata, namely the one-to-many merge.
merge 1:m varlist using filename
Other types of merge are possible:
One-to-one merge on specified key variables
merge 1:1 varlist using filename
Many-to-one merge on specified key variables
merge m:1 varlist using filename
Many-to-many merge on specified key variables
merge m:m varlist using filename
One-to-one merge by observation
merge 1:1 _n using filename
Details, explanations and examples can be found in help merge
.
If you do not know if observations are unique in a dataset, you can do the following check:
bysort idvar: gen N = _N
ta N
If you find values of N that are greater than 1, you know that observations are not unique with respect to idvar.
This is in fact the new syntax of the merge
command that has been introduced with Stata 11. Before Stata 11, the merge command was a bit simpler. You simply had to sort your data, and then you could do:
merge varlist using filename
By the way, you can still use this old syntax in Stata 11 or higher.
joinby, unmatched(both) 是对应于 R 命令合并的命令。
特别是合并 m:m 不会执行多对多合并(即完全连接),这与文档所暗示的相反。