1

我搜索了 SO,但似乎找不到适用于我的问题的正确代码。它类似于这个问题:Linear Regression calculate several times in one dataframe

我在 Andrie 的代码之后得到了一个 LR 系数的数据框:

Cddply <- ddply(test, .(sumtest), function(test)coef(lm(Area~Conc, data=test))) 

sumtest (Intercept) Conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617

我的问题是如何将这些 LR 模型(1-10)中的每一个应用于另一个数据帧中的特定行间隔,以便将自变量 x 放入第三列。例如,我想以 24 和 16 个样本的间隔将 sumtest1 应用于样本 6:29,将 sumtest2 应用于样本 35:50,将 sumtest3 应用于样本 56:79 等。样本编号在 200 之后重复,因此 sumtest9 将再次用于样本 6:29。

Sample  Area
6   236211
7   724919
8   1259814
9   1574722
10  268836
11  863818
12  1261768
13  1591845
14  220322
15  608396
16  980182
17  1415859
18  276276
19  724532
20  1130024
21  1147840
22  252051
23  544870
24  832512
25  899457
26  285093
27  4291007
28  825922
29  865491
35  246707
36  538092
37  767269
38  852410
39  269152
40  971471
41  1573989
42  1897208
43  261321
44  481486
45  598617
46  769240
47  229695
48  782691
49  1380597
50  1725419

生成的数据框如下所示:

Sample  Area    Calc
6   236211  407.5312917
7   724919  985.1525288
8   1259814 1617.363812
9   1574722 1989.564693
10  268836  446.0919309
...
35  246707  365.2452551
36  538092  724.3591324
37  767269  1006.805521
38  852410  1111.736505
39  269152  392.9073207

谢谢您的帮助。

4

1 回答 1

0

这是你想要的吗?我制作了一个稍大的“区域”虚拟数据集,以便在我尝试代码时更容易看到代码是如何工作的。

# create 400 rows of area data
set.seed(123)
df <- data.frame(area = round(rnorm(400, mean = 1000000, sd = 100000)))

# "sample numbers repeats after 200" -> add a sample nr 1-200, 1-200
df$sample_nr <- 1:200

# create a factor which cuts the vector of sample_nr into pieces of length 16, 24, 16, 24...
# repeat to a total length of the pieces is 200 
# i.e. 5 repeats of (16, 24)
grp <- cut(df$sample_nr, breaks = c(-Inf, cumsum(rep(c(16, 24), 5))))

# add a numeric version of the chunks to data frame
# this number indicates the model from which coefficients will be used
# row 1-16 (16 rows): model 1; row 17-40 (24 rows): model 2;
# row 41-56 (16 rows): model 3; and so on. 
df$mod <- as.numeric(grp)

# read coefficients
coefs <- read.table(text = "intercept beta_conc
1   -108589.2726    846.0713372
2   -49653.18701    811.3982918
3   -102598.6252    832.6419926
4   -72607.4017 727.0765558
5   54224.28878 391.256075
6   -42357.45407    357.0845661
7   -34171.92228    367.3962888
8   -9332.569856    289.8631555
9   -7376.448899    335.7047756
10  -37704.92277    359.1457617", header = TRUE)

# add model number
coefs$mod <- rownames(coefs)

head(df)
head(coefs)

# join area data and coefficients by model number
# (use 'join' instead of merge to avoid sorting)
library(plyr)
df2 <- join(df, coefs)

# calculate conc from area and model coefficients
# area = intercept + beta_conc * conc
# conc = (area - intercept) / beta_conc
df2$conc <- (df2$area - df2$intercept) / df2$beta_conc
head(df2, 41)
于 2013-09-19T22:12:59.953 回答