我有 2 个数据框df2
和DF
.
> DF
date tickers
1 2000-01-01 B
2 2000-01-01 GOOG
3 2000-01-01 V
4 2000-01-01 YHOO
5 2000-01-02 XOM
> df2
date tickers quantities
1 2000-01-01 BB 11
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
8 2000-01-02 BB 422
我需要这些值df2
中存在的值DF
。这意味着我需要以下输出:
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
所以我使用了以下代码:
> subset(df2,df2$date %in% DF$date & df2$tickers %in% DF$tickers)
date tickers quantities
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
但是输出包含一个额外的列。那是因为ticker
'xom' 在 2 天内出现在df2
. 所以两行都被选中。我的代码需要进行哪些修改?
输出如下:
> dput(DF)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("2000-01-01",
"2000-01-02"), class = "factor"), tickers = structure(c(4L, 5L,
6L, 8L, 7L), .Label = c("A", "AA", "AAPL", "B", "GOOG", "V",
"XOM", "YHOO", "Z"), class = "factor")), .Names = c("date", "tickers"
), row.names = c(NA, -5L), class = "data.frame")
> dput(df2)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L), .Label = c("2000-01-01", "2000-01-02"), class = "factor"),
tickers = structure(c(2L, 5L, 3L, 6L, 4L, 1L, 5L, 2L), .Label = c("B",
"BB", "GOOG", "V", "XOM", "YHOO"), class = "factor"), quantities = c(11,
23, 42, 21, 2112, 13, 24, 422)), .Names = c("date", "tickers",
"quantities"), row.names = c(NA, -8L), class = "data.frame")