python - 在 lightfm 中设置用户项目交互数据的正确方法

Question

对于我在其他项目/产品上有其他隐式数据的情况，在向 lightfm 模型提供数据时设置数据的正确方法是什么。例如，我有100k 用户 x 200 项交互数据，但在实际应用中，我希望模型仅提供 200 项中的 50 项的推荐。那么如何设置数据呢？我正在考虑两种情况，但我不确定哪种方法是正确的：

案例 1：直接将整个矩阵（100k 个用户 x 200 个项目）作为interactions参数输入 lightfm。这种方式是更多的协作学习。

案例 2：仅将 (100k 个用户 x 50 个项目) 提供给interactions参数并将 (100k x 150 个项目) 矩阵用作user_features. 这种方式是更多基于内容的学习。

哪一个是正确的？此外，对于案例 1，模型评估（精度、召回率等）的效用函数是否可以仅针对选定项目进行推荐，例如，前 k 个推荐项目应仅从 50 个项目中获取，而不是推荐其他项目并从中计算精度、召回率等。

score 1 · Accepted Answer

您应该遵循案例 1。使用整个交互数据训练模型。在进行预测时，您可以将 required(50) 项的索引作为参数传递给 model.predict。

从 lightfm 文档中，您可以看到 model.predict 将项目 ID 作为参数（在这种情况下，这将是您的 50 个项目的 ID）。

https://making.lyst.com/lightfm/docs/_modules/lightfm/lightfm.html#LightFM.predict

def predict(self, user_ids, item_ids, item_features=None, user_features=None, num_threads=1): """ 计算用户-项目对的推荐分数。

    Arguments
    ---------

    user_ids: integer or np.int32 array of shape [n_pairs,]
         single user id or an array containing the user ids for the
         user-item pairs for which a prediction is to be computed
    item_ids: np.int32 array of shape [n_pairs,]
         an array containing the item ids for the user-item pairs for which
         a prediction is to be computed
    user_features: np.float32 csr_matrix of shape [n_users, n_user_features], optional
         Each row contains that user's weights over features
    item_features: np.float32 csr_matrix of shape [n_items, n_item_features], optional
         Each row contains that item's weights over features
    num_threads: int, optional
         Number of parallel computation threads to use. Should
         not be higher than the number of physical cores.

python - 在 lightfm 中设置用户项目交互数据的正确方法

1 回答 1

Related

Reference