1

我有一个看起来像这样的数据框,我正在为 ggplot 做准备:

txt <- "v1 v2 v3
'Strongly agree' 83.1 var1
'Agree' 14.9 var1
'Disagree' 1.5 var1
'Strongly disagree' 0.6 var1
'Strongly agree' 11.8 var2
'Agree' 36.5 var2
'Disagree' 17.7 var2
'Strongly disagree' 43.8 var2
'Strongly agree' 19.6 var3
'Agree' 12 var3
'Disagree' 31.6 var3
'Strongly disagree' 36.8 var3"

mydata <- read.table(textConnection(txt), sep = " ", header = TRUE)

我的问题是:如何mydata$v3根据 in 的值mydta$v2和 in 的级别对级别进行排序mydata$v1

一个例子:例如,如果我想根据“强烈同意”级别中mydata$v3的最高值对级别进行排序,那么我将得到的顺序是:,,因为mydata$v2mydata$v1var1var3var2mydata$v2是 83.1、19.6、11.8。

另一个例子:例如,如果我想根据“非常同意”和“同意”级别mydata$v3中的值的总和对级别进行排序,那么我将得到的顺序是:, ,因为mydata$v2mydata$v1var1var2var3mydata$v2是 (83.1 +14.9)=98, (11.8+36.5)=48.3, (19.6+12)=31.6

我不知道如何自己解决这个问题。而且,我处理了很多这样的帧,所以代码必须进入一个函数

编辑:

在这两个例子中,我想要的结果是原始的 data.frame ,只有 mydata$v3 中的级别顺序发生了变化。

所以在示例 1 中,我有:

                  v1   v2   v3
1     Strongly agree 83.1 var1
2              Agree 14.9 var1
3           Disagree  1.5 var1
4  Strongly disagree  0.6 var1
5     Strongly agree 11.8 var2
6              Agree 36.5 var2
7           Disagree 17.7 var2
8  Strongly disagree 43.8 var2
9     Strongly agree 19.6 var3
10             Agree 12.0 var3
11          Disagree 31.6 var3
12 Strongly disagree 36.8 var3 

levels(mydata$v3)
[1] "var1" "var2" "var3"

但我想结束的是这个。

                  v1   v2   v3
1     Strongly agree 83.1 var1
2              Agree 14.9 var1
3           Disagree  1.5 var1
4  Strongly disagree  0.6 var1
5     Strongly agree 11.8 var2
6              Agree 36.5 var2
7           Disagree 17.7 var2
8  Strongly disagree 43.8 var2
9     Strongly agree 19.6 var3
10             Agree 12.0 var3
11          Disagree 31.6 var3
12 Strongly disagree 36.8 var3 

levels(mydata$v3)
[1] "var1" "var3" "var2"

在示例二中,我有:

                  v1   v2   v3
1     Strongly agree 83.1 var1
2              Agree 14.9 var1
3           Disagree  1.5 var1
4  Strongly disagree  0.6 var1
5     Strongly agree 11.8 var2
6              Agree 36.5 var2
7           Disagree 17.7 var2
8  Strongly disagree 43.8 var2
9     Strongly agree 19.6 var3
10             Agree 12.0 var3
11          Disagree 31.6 var3
12 Strongly disagree 36.8 var3 

levels(mydata$v3)
[1] "var1" "var2" "var3"

但想要:

                  v1   v2   v3
1     Strongly agree 83.1 var1
2              Agree 14.9 var1
3           Disagree  1.5 var1
4  Strongly disagree  0.6 var1
5     Strongly agree 11.8 var2
6              Agree 36.5 var2
7           Disagree 17.7 var2
8  Strongly disagree 43.8 var2
9     Strongly agree 19.6 var3
10             Agree 12.0 var3
11          Disagree 31.6 var3
12 Strongly disagree 36.8 var3 

levels(mydata$v3)
[1] "var1" "var2" "var3"

请注意,在示例二中,我拥有的和想要的是相同的,但我有很多 data.frames,但情况并非如此。

我想我正在寻找的是一个复杂的版本

factor(maydata$v3, levels(mydata$v3)[EXAMPLE1: order after value in v2 within 1 level in v1 /EXAMPLE2: order after sum of value within 2 levels in v1])
4

1 回答 1

0

这是一个解决方案aggregate

f <- function(mydata, v1.val) {
  # Value or sum of v2 within the selected rows
  sums <- aggregate(v2 ~ v3, data=mydata[mydata$v1 %in% v1.val,], FUN=sum)

  # Decreasing order of the sum of v2 values, or the only v2 value, for each level of v3
  ord <- order(sums$v2, decreasing=TRUE)

  # Build a new factor with the proper levels and assign it to v3
  fac <- factor(mydata$v3, levels=sums$v3[ord])

  mydata$v3 <- fac
  return(mydata)
}

数据框如上所示,但因子水平符合要求:

> f(mydata, 'Strongly agree')$v3
 [1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var3 var2

> f(mydata, c('Strongly agree', 'Agree'))$v3
 [1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var2 var3
于 2014-05-21T01:02:19.707 回答