更新(先阅读)
如果您真的只对行索引感兴趣,也许可以直接使用split
and range
。以下假设您的数据集中的行名是按顺序编号的,但也可能进行调整。
irisFirstLast <- sapply(split(iris, iris$Species),
function(x) range(as.numeric(rownames(x))))
irisFirstLast ## Just the indices
# setosa versicolor virginica
# [1,] 1 51 101
# [2,] 50 100 150
iris[irisFirstLast[1, ], ] ## `1` would represent "first"
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
iris[irisFirstLast, ] ## nothing would represent both first and last
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 50 5.0 3.3 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 100 5.7 2.8 4.1 1.3 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
# 150 5.9 3.0 5.1 1.8 virginica
d <- datasets::Puromycin
dFirstLast <- sapply(split(d, d$state),
function(x) range(as.numeric(rownames(x))))
dFirstLast
# treated untreated
# [1,] 1 13
# [2,] 12 23
d[dFirstLast[2, ], ] ## `2` would represent `last`
# conc rate state
# 12 1.1 200 treated
# 23 1.1 160 untreated
如果使用命名行,一般方法是相同的,但您必须自己指定范围。这是一般模式:
datasetFirstLast <- sapply(split(dataset, dataset$groupingvariable),
function(x) c(rownames(x)[1],
rownames(x)[length(rownames(x))]))
初步答案(已编辑)
如果您有兴趣提取行而不是将行号用于其他目的,您还可以探索data.table
. 这里有些例子:
library(data.table)
DT <- data.table(iris, key="Species")
DT[J(unique(Species)), mult = "first"]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.1 3.5 1.4 0.2
# 2: versicolor 7.0 3.2 4.7 1.4
# 3: virginica 6.3 3.3 6.0 2.5
DT[J(unique(Species)), mult = "last"]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.0 3.3 1.4 0.2
# 2: versicolor 5.7 2.8 4.1 1.3
# 3: virginica 5.9 3.0 5.1 1.8
DT[, .SD[c(1,.N)], by=Species]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.1 3.5 1.4 0.2
# 2: setosa 5.0 3.3 1.4 0.2
# 3: versicolor 7.0 3.2 4.7 1.4
# 4: versicolor 5.7 2.8 4.1 1.3
# 5: virginica 6.3 3.3 6.0 2.5
# 6: virginica 5.9 3.0 5.1 1.8
最后一种方法非常方便。例如,如果您想要每组的前三行和后三行,您可以使用:(DT[, .SD[c(1:3, (.N-2):.N)], by=Species]
仅供参考:.N
表示每组的案例数。
其他有用的方法包括:
DT[, tail(.SD, 2), by = Species] ## last two rows of each group
DT[, head(.SD, 4), by = Species] ## first four rows of each group