0

I want to sort my.data[4:10] in descending order by row. Some clues here, but I could not parse it sufficiently: Sort second to fifth column for each row in R.

I also tried things like:

sort(my.data, decreasing = TRUE, partial = c([4:10]))

which didn't work, but I think the former is more in line with what I need. I read through ?cbind, ?apply, and ?sort help, but the examples are just to cryptic for me.

Here's my sample dataset:

habitat<-c('Marsh','Prairie','Savanna','Swamp','Woodland')
NumSites<-c(3,3,4,1,4)
NumSamples<-c(6,5,8,2,8)
Sp1<-c(NA,2,NA,2,1)
Sp2<-c(NA,2,1,NA,1)
Sp3<-c(NA,NA,NA,NA,1)
Sp4<-c(3,NA,NA,NA,NA)
Sp5<-c(NA,NA,3,NA,NA)
Sp6<-c(1,NA,67,NA,2)
Sp7<-c(NA,2,3,NA,1)

my.data<-data.frame(habitat,NumSites,NumSamples,Sp1,Sp2,Sp3,Sp4,Sp5,Sp6,Sp7)

# I suspect a varient of this must work:
# cbind(df[,1], t(apply(df[,-1], 1, sort)))

desired result should look like:

habitat  NumSites NumSamples Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7
Marsh    3        6          3   1   NA  NA  NA  NA  NA
Prairie  3        5          2   2   2   NA  NA  NA  NA
Savanna  4        8          67  3   3   1   NA  NA  NA
Swamp    1        2          2   NA  NA  NA  NA  NA  NA
Woodland 4        8          2   1   1   1   1   NA  NA

I feel like the cbind approach is close...

Also, actual data has many and varied number of columns and column names, so I want to use range [4:10] instead of names of columns.

4

3 回答 3

2

This seems to work fine:

my.data[,4:10] <- t(apply(my.data[,4:10], 1,  function(x) sort(x, na.last = T, decreasing=T)))


#   habitat NumSites NumSamples Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7
#1    Marsh        3          6   3   1  NA  NA  NA  NA  NA
#2  Prairie        3          5   2   2   2  NA  NA  NA  NA
#3  Savanna        4          8  67   3   3   1  NA  NA  NA
#4    Swamp        1          2   2  NA  NA  NA  NA  NA  NA
#5 Woodland        4          8   2   1   1   1   1  NA  NA
于 2014-09-01T14:09:47.957 回答
2

This answer's approach, which you quote above, is close:

cbind(df[,1], t(apply(df[,-1], 1, sort)))

but it needed two changes:

  • You want to sort all but the first three columns, not all but the first. So change [,1] and [,-1] to [, 1:3] and [, -(1:3)], respectively.
  • By default, sort sorts in increasing order while you want decreasing order, and drops the NAs out entirely, while you want them last. You can fix this by adding the decreasing=TRUE, na.last=TRUE arguments to sort.

This makes the solution:

cbind(my.data[, 1:3], t(apply(my.data[, -(1:3)], 1, function(v) sort(v, decreasing=TRUE, na.last=TRUE))))

Note that it might be a bit clearer if you split it onto multiple lines:

mysort = function(v) sort(v, decreasing=TRUE, na.last=TRUE)
sorted.cols = t(apply(my.data[, -(1:3)], 1, mysort))
cbind(my.data[, 1:3], sorted.cols)
于 2014-09-01T14:09:59.157 回答
2

You don't need an anonymous function on this.

> my.data[4:10] <-t(apply(my.data[4:10],1,sort,decreasing = TRUE,na.last = TRUE))
> my.data
#    habitat NumSites NumSamples Sp1 Sp2 Sp3 Sp4 Sp5 Sp6 Sp7
# 1    Marsh        3          6   3   1  NA  NA  NA  NA  NA
# 2  Prairie        3          5   2   2   2  NA  NA  NA  NA
# 3  Savanna        4          8  67   3   3   1  NA  NA  NA
# 4    Swamp        1          2   2  NA  NA  NA  NA  NA  NA
# 5 Woodland        4          8   2   1   1   1   1  NA  NA
于 2014-09-01T14:18:41.787 回答