python - 如何计算权重以最小化方差？

Question

给定几个向量：

x1 = [3 4 6]
x2 = [2 8 1]
x3 = [5 5 4]
x4 = [6 2 1]

我想找到每个项目的权重 w1、w2、w3，并得到每个向量的加权和：yi = w1*i1 + w2*i2 + w3*i3。例如，y1 = 3*w1 + 4*w2 + 6*w3 使这些值（y1，y2，y3，y4）的方差最小化。

注意：w1、w2、w3 应该 > 0，并且w1 + w2 + w3 = 1

不知道应该是什么问题……用python或者matlab怎么解决？

score 1 · Accepted Answer

您可以从构建一个损失函数开始，说明w's 上的方差和约束。平均值是m = (1/4)*(y1 + y2 + y3 + y4)。那么方差是(1/4)*((y1-m)^2 + (y2-m)^2 + (y3-m)^2 + (y4-m)^2)，约束是拉格朗日乘数a*(w1+w2+w3 - 1)在哪里。a这个问题在我看来像是一个带有凸约束的凸优化，因为损失函数是关于目标变量 (w1,w2,w3) 的二次函数，并且约束是线性的。您可以寻找符合所提供约束的投影梯度下降算法。看看这里http://www.ifp.illinois.edu/~angelia/L5_exist_optimality.pdf这类问题一般没有直接的分析解决方案。

score 0 · Accepted Answer

我的完整解决方案可以在 PDF中查看。

诀窍是将向量x_i作为矩阵的列X。
然后写问题变成了一个凸问题，解决方案限制在单位单纯形上。

我使用Projected Sub Gradient Method解决了它。
我计算了目标函数的 Gradient 并创建了Unit Simplex的投影。

现在只需要迭代它们。
我使用CVX验证了我的解决方案。

% StackOverflow 44984132
% How to calculate weight to minimize variance?
% Remarks:
%   1.  sa
% TODO:
%   1.  ds
% Release Notes
% - 1.0.000     08/07/2017
%   *   First release.


%% General Parameters

run('InitScript.m');

figureIdx           = 0; %<! Continue from Question 1
figureCounterSpec   = '%04d';

generateFigures = OFF;


%% Simulation Parameters

dimOrder    = 3;
numSamples = 4;

mX = randi([1, 10], [dimOrder, numSamples]);
vE = ones([dimOrder, 1]);


%% Solve Using CVX

cvx_begin('quiet')
    cvx_precision('best');
    variable vW(numSamples)
    minimize( (0.5 * sum_square_abs( mX * vW - (1 / numSamples) * (vE.' * mX * vW) * vE )) )
    subject to
        sum(vW) == 1;
        vW >= 0;
cvx_end

disp([' ']);
disp(['CVX Solution -                       [ ', num2str(vW.'), ' ]']);


%% Solve Using Projected Sub Gradient

numIterations   = 20000;
stepSize        = 0.001;
simplexRadius   = 1; %<! Unit Simplex Radius
stopThr         = 1e-6;

hKernelFun  = @(vW) ((mX * vW) - ((1 / numSamples) * ((vE.' * mX * vW) * vE)));
hObjFun     = @(vW) 0.5 * sum(hKernelFun(vW) .^ 2);
hGradFun    = @(vW) (mX.' * hKernelFun(vW)) - ((1 / numSamples) * vE.' * (hKernelFun(vW)) * mX.' * vE);

vW = rand([numSamples, 1]);
vW = vW(:) / sum(vW);

for ii = 1:numIterations
    vGradW = hGradFun(vW);
    vW = vW - (stepSize * vGradW);

    % Projecting onto the Unit Simplex
    % sum(vW) == 1, vW >= 0.
    vW = ProjectSimplex(vW, simplexRadius, stopThr);
end

disp([' ']);
disp(['Projected Sub Gradient Solution -    [ ', num2str(vW.'), ' ]']);


%% Restore Defaults

% set(0, 'DefaultFigureWindowStyle', 'normal');
% set(0, 'DefaultAxesLooseInset', defaultLoosInset);

您可以在StackOverflow Q44984132中查看完整代码（也提供 PDF）。

score 0 · Accepted Answer

我对优化问题了解不多，但我得到了梯度下降的想法，所以我试图减少最大分数和最小分数之间的权重，我的脚本如下：

# coding: utf-8
import numpy as np
#7.72
#7.6
#8.26

def get_max(alist):
    max_score = max(alist)
    idx = alist.index(max_score)
    return max_score, idx

def get_min(alist):
    max_score = min(alist)
    idx = alist.index(max_score)
    return max_score, idx

def get_weighted(alist,aweight):
    res = []
    for i in range(0, len(alist)):
        res.append(alist[i]*aweight[i])
    return res

def get_sub(list1, list2):
    res = []
    for i in range(0, len(list1)):
        res.append(list1[i] - list2[i])
    return res

def grad_dec(w,dist, st = 0.001):
    max_item, max_item_idx = get_max(dist)
    min_item, min_item_idx = get_min(dist)
    w[max_item_idx] = w[max_item_idx] - st
    w[min_item_idx] = w[min_item_idx] + st

def cal_score(w, x):
    score = []
    print 'weight', w ,x
    for i in range(0, len(x)):
        score_i = 0
        for j in range(0,5):
            score_i = w[j]*x[i][j] + score_i
        score.append(score_i)
    # check variance is small enough
    print 'score', score
    return score

    # cal_score(w,x)

if __name__ == "__main__":
    init_w = [0.2, 0.2, 0.2, 0.2, 0.2, 0.2]
    x = [[7.3, 10, 8.3, 8.8, 4.2], [6.8, 8.9, 8.4, 9.7, 4.2], [6.9, 9.9, 9.7, 8.1, 6.7]]
    score = cal_score(init_w,x)
    variance = np.var(score)
    round = 0
    for round in range(0, 100):
        if variance < 0.012:
            print 'ok'
            break
        max_score, idx = get_max(score)
        min_score, idx2 = get_min(score)
        weighted_1 = get_weighted(x[idx], init_w)
        weighted_2 = get_weighted(x[idx2], init_w)
        dist = get_sub(weighted_1, weighted_2)
        # print max_score, idx, min_score, idx2, dist
        grad_dec(init_w, dist)
        score = cal_score(init_w, x)
        variance = np.var(score)
        print 'variance', variance

    print score

在我的实践中，它确实可以减少方差。我很高兴，但我不知道我的解决方案在数学上是否可靠。

score 0 · Accepted Answer

w = [5, 6, 7]
x1 = [3, 4, 6]
x2 = [2, 8, 1]
x3 = [5, 5, 4]
y1, y2, y3 = 0, 0, 0
for index, i in enumerate(w):
    y1 = y1 + i * x1[index]
    y2 = y2 + i * x2[index]
    y3 = y3 + i * x3[index]
print(min(y1, y2, y3))

我想我可能会得到你的问题的目的。但是如果你想找到最小值，我希望这可以帮助你。def我只是将值固定，当您看到这是解决问题的一种方法时，您可以使其成为。

python - 如何计算权重以最小化方差？

4 回答 4

Related

Reference