3

I'm trying to use ggplot to make a graph that has the composition of substrates at 6 different sites and at 7 different times. The problem is I have different amount of samples for each sampling period and site. I essentially want the code y=freq/(#of stations in that time period). The following is a sample of my data set

   Substrate     Time   Site Freq
1      Floc    July 11   P1    4
2      Fine    July 11   P1    2
3    Medium    July 11   P1   12
4    Coarse    July 11   P1    0
5   Bedrock    July 11   P1    3
6      Floc     Aug 11   P1    7
7      Fine     Aug 11   P1    1
8    Medium     Aug 11   P1    7
9    Coarse     Aug 11   P1    1
10  Bedrock     Aug 11   P1    4

Therefore I want

      Var1       Var2 Var3 Freq
1      Floc    July 11   P1    4/(21 - The number of samples taken in July).

Any ideas on how to write this code and then plot the results?

4

2 回答 2

5

使用 data.table (来自同名包)...

require(data.table)
DT <- data.table(dat)

DT[,Freq2:=Freq/sum(Freq),by=Var2]

这使

       Var1    Var2 Var3 Freq     Freq2
 1:    Floc July 11   P1    4 0.1904762
 2:    Fine July 11   P1    2 0.0952381
 3:  Medium July 11   P1   12 0.5714286
 4:  Coarse July 11   P1    0 0.0000000
 5: Bedrock July 11   P1    3 0.1428571
 6:    Floc  Aug 11   P1    7 0.3500000
 7:    Fine  Aug 11   P1    1 0.0500000
 8:  Medium  Aug 11   P1    7 0.3500000
 9:  Coarse  Aug 11   P1    1 0.0500000
10: Bedrock  Aug 11   P1    4 0.2000000

编辑:这个问题现在有更好的列名,所以“for...period and site”的含义更清楚了。正如@DWin 在评论中所写,现在的答案是:

DT[,Freq2:=Freq/sum(Freq),by='Time,Site']
于 2013-09-18T16:40:28.090 回答
3

看看?ave

df <- read.table(textConnection("
Var0 Var1       Var2 Var3 Freq
1      Floc    July 11   P1    4
2      Fine    July 11   P1    2
3    Medium    July 11   P1   12
4    Coarse    July 11   P1    0
5   Bedrock    July 11   P1    3
6      Floc     Aug 11   P1    7
7      Fine     Aug 11   P1    1
8    Medium     Aug 11   P1    7
9    Coarse     Aug 11   P1    1
10  Bedrock     Aug 11   P1    4"), header=TRUE, row.names=1)

df$freq <- ave(df$Freq, df$Var1, FUN=function(x)x/sum(x))
df
#      Var0 Var1 Var2 Var3 Freq      freq
#1     Floc July   11   P1    4 0.1904762
#2     Fine July   11   P1    2 0.0952381
#3   Medium July   11   P1   12 0.5714286
#4   Coarse July   11   P1    0 0.0000000
#5  Bedrock July   11   P1    3 0.1428571
#6     Floc  Aug   11   P1    7 0.3500000
#7     Fine  Aug   11   P1    1 0.0500000
#8   Medium  Aug   11   P1    7 0.3500000
#9   Coarse  Aug   11   P1    1 0.0500000
#10 Bedrock  Aug   11   P1    4 0.2000000
于 2013-09-18T16:41:34.590 回答