1

我正在尝试在 C# 中执行偏最小二乘回归分析。在 MATLAB 中执行的 pls 技术使用提供 beta(回归系数矩阵)的 SIMPLS 算法。

  • 我不明白为什么两种情况下的矩阵都不同,我将输入传递给 C# 版本的方式是否有一些错误?

  • 此外,两者的输入相同,并且参考了此处包含的论文。

最小的工作示例

MATLAB:遵循 Hervé Abdi 的小例子(Hervé Abdi,偏最小二乘回归)。参考资料:PDF

clear all;
clc;
inputs = [7, 7, 13, 7; 4, 3, 14, 7; 10, 5, 12, 5; 16, 7, 11, 3; 13, 3, 10, 3];
outputs = [14, 7, 8; 10, 7, 6; 8, 5, 5; 2, 4,7; 6, 2, 4];
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(inputs,outputs, 1);
disp 'beta'
beta
disp 'beta size'
size(beta)
yfit = [ones(size(inputs,1),1) inputs]*beta;
residuals = outputs - yfit;

% stem(residuals)
% xlabel('Observation');
% ylabel('Residual');

beta =

   1.0484e+01   6.1899e+00   6.2841e+00
  -6.3488e-01  -3.0405e-01  -7.2608e-02
   2.1949e-02   1.0512e-02   2.5102e-03
   1.9226e-01   9.2078e-02   2.1988e-02
   2.8948e-01   1.3864e-01   3.3107e-02

雅阁网:

double[][] inputs = new double[][]
    {
        //      Wine | Price | Sugar | Alcohol | Acidity
        new double[] {   7,     7,      13,        7 },
        new double[] {   4,     3,      14,        7 },
        new double[] {  10,     5,      12,        5 },
        new double[] {  16,     7,      11,        3 },
        new double[] {  13,     3,      10,        3 },
    };

double[][] outputs = new double[][]
    {
        //             Wine | Hedonic | Goes with meat | Goes with dessert
        new double[] {           14,          7,                 8 },
        new double[] {           10,          7,                 6 },
        new double[] {            8,          5,                 5 },
        new double[] {            2,          4,                 7 },
        new double[] {            6,          2,                 4 },
    };

var pls = new PartialLeastSquaresAnalysis()
        {
            Method = AnalysisMethod.Center,
            Algorithm = PartialLeastSquaresAlgorithm.NIPALS
        };

var regression = pls.Learn(inputs, outputs);

double[][] coeffs = regression.Weights;
>>
-1.69811320754717 -0.0566037735849056   0.0707547169811322
1.27358490566038   0.29245283018868     0.571933962264151
-4                 1                    0.5
1.17924528301887   0.122641509433962    0.159198113207547
4

1 回答 1

2

我认为调用 MATLAB 和 Accord.NET 版本的 PLS 的方式之间至少存在三个差异。

  1. 正如您所提到的,MATLAB 正在使用 SIMPLS。然而,Accord.NET 被告知使用 NIPALS。

  2. MATLAB 版本被称为plsregress(inputs, outputs, 1 ),这意味着仅考虑 PLS 中的 1 个潜在组件来计算回归,但尚未指示您 Accord.NET 执行相同操作。

  3. Accord.NET 返回包含权重矩阵和截距向量的 MultivariateLinearRegression 对象,而 MATLAB 将截距作为权重矩阵的第一列返回。

一旦考虑了所有这些,就可以生成与 MATLAB 版本完全相同的结果:

double[][] inputs = new double[][]
{
    //      Wine | Price | Sugar | Alcohol | Acidity
    new double[] {   7,     7,      13,        7 },
    new double[] {   4,     3,      14,        7 },
    new double[] {  10,     5,      12,        5 },
    new double[] {  16,     7,      11,        3 },
    new double[] {  13,     3,      10,        3 },
};

double[][] outputs = new double[][]
{
    //             Wine | Hedonic | Goes with meat | Goes with dessert
    new double[] {           14,          7,                 8 },
    new double[] {           10,          7,                 6 },
    new double[] {            8,          5,                 5 },
    new double[] {            2,          4,                 7 },
    new double[] {            6,          2,                 4 },
};

// Create the Partial Least Squares Analysis
var pls = new PartialLeastSquaresAnalysis()
{
    Method = AnalysisMethod.Center,
    Algorithm = PartialLeastSquaresAlgorithm.SIMPLS, // First change: use SIMPLS
};

// Learn the analysis
pls.Learn(inputs, outputs);

// Second change: Use just 1 latent factor/component
var regression = pls.CreateRegression(factors: 1);

// Third change: present results as in MATLAB
double[][] w = regression.Weights.Transpose();
double[] b = regression.Intercepts;

// Add the intercepts as the first column of the matrix of
// weights and transpose it as in the way MATLAB presents it
double[][] coeffs = (w.InsertColumn(b, index: 0)).Transpose();

// Show results in MATLAB format
string str = coeffs.ToOctave();

随着这些变化,上面的 coeffs 矩阵应该变成

[ 10.4844779770616    6.18986077674717    6.28413863347486    ;
  -0.634878923091644 -0.304054829845448  -0.0726082626993539  ;
   0.0219492754418065 0.0105118991463605  0.00251024045589416 ;
   0.192261724966225  0.0920775662006966  0.0219881135215502  ; 
   0.289484835410222  0.13863944631343    0.033107085796122   ]
于 2017-07-09T14:00:06.567 回答