r - 在使用 pls 包的 R 中，如何按组/因子获得系数的估计值

Question

我已经开始查看pls包，我不确定如何提取单独的系数group/factor。我可以为每个组运行单独的模型，或者考虑X ~ group交互项，但这不是我所追求的。

我正在使用以下语法：

model1 <- plsr(outcome ~ pred * group, data =plsDATA,2)

我尝试过使用以下内容：

model2 <- plsr(outcome ~ embed(pred:as.factor(group)), data=plsDATA,2)

但这会导致此错误：

model.frame.default 中的错误（公式 = 结果〜嵌入（pred：as.factor（group）），：可变长度不同（找到'embed（pred：as.factor（group））'）此外：警告消息: 1: 在 pred:as.factor(group) 中: 数值表达式有 640 个元素: 只使用第一个 2: 在 pred:as.factor(group) : 数值表达式有 32 个元素: 只使用第一个

我不确定为什么会收到可变长度错误，因为运行以下命令会提供兼容的尺寸：

dim(group)
[1] 32  1

dim(outcome)
[1] 32  1

dim(pred)
[1] 32 20

代码如下：

library(pls) #Dummy Data 
setwd("/Users/John/Documents") 
Data <- read.csv("SamplePLS.csv") #Define each of the inputs pred is X, group is the factor & outcome is Y 
pred <- as.matrix(Data[,3:22]) 
group <- as.matrix(Data[,1]) 
outcome <- as.matrix(Data[,2]) #now combine the matrices into a single dataframe 
plsDATA <- data.frame(SampN=c(1:nrow(Data))) 
plsDATA$pred <- pred 
plsDATA$group <- group 
plsDATA$outcome <-outcome #define the model - ask for two components 
model1 <- plsr(outcome ~ pred * group, data=plsDATA,2)#Get coefficients from this object

score 0 · Accepted Answer

实际上，我只是想通了这一点。您需要对分组变量进行虚拟编码并使其成为结果（即预测变量）。在这种情况下，我有两列代表组成员身份。在每种情况下，组中的成员身份由 1 表示，非成员身份由 0 表示。然后我将前两列称为组（即 group <- as.matrix(Data[,1:2])）并运行其余列和之前一样用组代替结果的代码。

score 0 · Accepted Answer

According to your question, you are wanting to extract the coefficients. There is a function, 'coef()' that will pull them out easily. See the results below.

Data <- read.csv("SamplePLS.csv") #Define each of the inputs pred is X, group

is the factor & outcome is Y 
> pred <- as.matrix(Data[,3:22]) 
> group <- as.matrix(Data[,1]) 
> outcome <- as.matrix(Data[,2]) #now combine the matrices into a single dataframe 
> plsDATA <- data.frame(SampN=c(1:nrow(Data))) 
> plsDATA$pred <- pred 
> plsDATA$group <- group 
> plsDATA$outcome <-outcome #define the model - ask for two components 
> model1 <- plsr(outcome ~ pred * group, data=plsDATA,2)
> coef(model1)
, , 2 comps

                       outcome
predpred1        -1.058426e-02
predpred2         2.634832e-03
predpred3         3.579453e-03
predpred4         1.135424e-02
predpred5         3.271867e-04
predpred6         4.438445e-03
predpred7         8.425997e-03
predpred8         3.001517e-03
predpred9         2.111697e-03
predpred10       -9.264594e-04
predpred11        1.885554e-03
predpred12       -2.798959e-04
predpred13       -1.390471e-03
predpred14       -1.023795e-03
predpred15       -3.233470e-03
predpred16        5.398053e-03
predpred17        9.796533e-03
predpred18       -8.237801e-04
predpred19        4.778983e-03
predpred20        1.235484e-03
group             9.463735e-05
predpred1:group  -8.814101e-03
predpred2:group   9.013430e-03
predpred3:group   7.597494e-03
predpred4:group   1.869234e-02
predpred5:group   1.462835e-03
predpred6:group   6.928687e-03
predpred7:group   1.925111e-02
predpred8:group   3.752095e-03
predpred9:group   2.404539e-03
predpred10:group -1.288023e-03
predpred11:group  4.271393e-03
predpred12:group  6.704938e-04
predpred13:group -3.943964e-04
predpred14:group -5.468510e-04
predpred15:group -5.595737e-03
predpred16:group  1.090501e-02
predpred17:group  1.977715e-02
predpred18:group -3.013597e-04
predpred19:group  1.169534e-02
predpred20:group  3.389127e-03

The same results could also be achieved with the call model1$coefficients or model1[[1]]. Based on the question, I think this is the result you are looking for.

r - 在使用 pls 包的 R 中，如何按组/因子获得系数的估计值

2 回答 2

Related

Reference