我正在关注此处找到的论文,并尝试进行批量梯度下降(BGD)而不是论文中描述的随机梯度下降(SGD)。
对于 SGD,我收集的是你这样做(伪代码):
for each user's actual rating {
1. calculate the difference between the actual rating
and the rating calculated from the dot product
of the two factor matrices (user vector and item vector).
2. multiply answer from 1. by the item vector
corresponding to that rating.
3. alter the initial user vector by the figure
calculated in 2. x by lambda e.g.:
userVector = userVector + lambda x answer from 2.
}
Repeat for every user
Do the same for every Item, except in 2. multiply by the user vector instead of the item vector
Go back to start and repeat until some breakpoint
对于 BGD,我所做的是:
for each user {
1. sum up all their prediction errors e.g.
real rating - (user vector . item vector) x item vector
2. alter the user vector by the figure calculated in 1. x by lambda.
}
Then repeat for the Items exchanging item vector in 2. for user vector
这似乎是有道理的,但在进一步阅读时,我对 BGD 感到困惑。它说 BGD 必须遍历整个数据集才能进行 1 次更改。这是否意味着像我所做的那样,相对于该特定用户的整个数据集,或者它的字面意思是整个数据集?
我做了一个遍历整个数据集的实现,对每个预测误差求和,然后使用该数字更新每个用户向量(所以所有用户向量都更新了相同的数量!)。但是,即使 λ 率为 0.002,它也不会接近最小值并且会迅速波动。它可以从 12'500 的平均误差到 1.2,然后到 -539 等等。最终,这个数字接近无穷大,我的程序失败了。
对这背后的数学的任何帮助都会很棒。