python - Lightfm：处理用户和项目冷启动

Question

我记得 lightfm 的一个优点是模型不会出现冷启动问题，用户和项目都冷启动：lightfm 原论文

但是，我仍然不明白如何使用 lightfm 来解决冷启动问题。我在user-item interaction data. 据我了解，我只能对我的数据集上存在的 profile_ids 进行预测。

def predict(self, user_ids, item_ids, item_features=None,
            user_features=None, num_threads=1):
    """
    Compute the recommendation score for user-item pairs.

    Arguments
    ---------

    user_ids: integer or np.int32 array of shape [n_pairs,]
         single user id or an array containing the user ids for the
         user-item pairs for which a prediction is to be computed
    item_ids: np.int32 array of shape [n_pairs,]
         an array containing the item ids for the user-item pairs for which
         a prediction is to be computed.
    user_features: np.float32 csr_matrix of shape [n_users, n_user_features], optional
         Each row contains that user's weights over features.
    item_features: np.float32 csr_matrix of shape [n_items, n_item_features], optional
         Each row contains that item's weights over features.
    num_threads: int, optional
         Number of parallel computation threads to use. Should
         not be higher than the number of physical cores.

    Returns
    -------

    np.float32 array of shape [n_pairs,]
        Numpy array containing the recommendation scores for pairs defined
        by the inputs.
    """

    self._check_initialized()

    if not isinstance(user_ids, np.ndarray):
        user_ids = np.repeat(np.int32(user_ids), len(item_ids))

    assert len(user_ids) == len(item_ids)

    if user_ids.dtype != np.int32:
        user_ids = user_ids.astype(np.int32)
    if item_ids.dtype != np.int32:
        item_ids = item_ids.astype(np.int32)

    n_users = user_ids.max() + 1
    n_items = item_ids.max() + 1

    (user_features,
     item_features) = self._construct_feature_matrices(n_users,
                                                       n_items,
                                                       user_features,
                                                       item_features)

    lightfm_data = self._get_lightfm_data()

    predictions = np.empty(len(user_ids), dtype=np.float64)

    predict_lightfm(CSRMatrix(item_features),
                    CSRMatrix(user_features),
                    user_ids,
                    item_ids,
                    predictions,
                    lightfm_data,
                    num_threads)

    return predictions

任何有助于我理解的建议或指示将不胜感激。谢谢你

score 10 · Accepted Answer

LightFM 与任何其他推荐算法一样，如果没有提供有关这些用户的额外信息，则无法对全新用户进行预测。尝试为新用户提供推荐的技巧是根据算法在训练期间看到的特征来描述他们。

这可能最好用一个例子来解释。假设您的训练集中有 ID 介于 0 和 10 之间的用户，并且您想对 ID 11 的新用户进行预测。如果您只有新用户的 ID，则算法将无法进行预测: 毕竟，它对用户 11 的偏好是什么一无所知。但是，假设您有一些特征来描述用户：可能在注册过程中，每个用户都选择了他们的一些兴趣（例如恐怖电影或浪漫喜剧）。如果这些特征在训练期间出现，算法可以了解平均而言哪些偏好与这些特征相关联，并且能够为任何可以使用相同特征描述的新用户生成推荐。在这个例子中，

在 LightFM 实现中，所有这些特征都将被编码到特征矩阵中，可能采用 one-hot 编码的形式。在为用户 11 进行推荐时，您将为该用户构建一个新的特征矩阵：只要该特征矩阵仅包含训练期间存在的特征，您就可以进行预测。

请注意，拥有仅对应于单个用户的功能通常很有用——例如“是用户 0”功能、“是用户 1”功能等等。对于新用户来说，这样的特征是没有用的，因为在训练中没有模型可以用来学习该特征的信息。

score 8 · Accepted Answer

这对我有用：

if user_index is not None:
    predictions = model.predict([user_index, ], np.array(target_item_indices))
else:
    predictions = model.predict(0, np.array(target_item_indices), user_features=user_features)

这里user_features是一个稀疏数组，它是从训练模型时使用的特征集仔细组装而成的。

例如，如果我得到一个新用户，并且用户的特征类似于user_feature_list = ["id-2837", "Cape Town", "Woodstock", 7700]，那么我构建特征数组如下：

from scipy import sparse

user_feature_map = store_model.user_feature_map  # the feature map was persisted during the previous round of offline training
num_features = len(user_feature_list)
normalised_val = 1.0 / num_features
target_indices = []
for feature in user_feature_list:
    try:
        target_indices.append(user_feature_map[feature])
    except KeyError:
        print("new user feature encountered '{}'".format(feature))
        pass
print("target indices: {}".format(target_indices))
user_features = np.zeros(len(user_feature_map.keys()))
for i in target_indices:
    user_features[i] = normalised_val
user_features = sparse.csr_matrix(user_features)

之前是通过在原始输入数据集上user_feature_map调用 LightFM 的方法生成的，在拟合后：mapping()

dataset.fit(
    unique_user_ids,
    unique_item_ids,
    item_features=item_feature_list,
    user_features=user_feature_list
)

user_id_map, user_feature_map, item_id_map, item_feature_map = dataset.mapping()

python - Lightfm：处理用户和项目冷启动

2 回答 2

Related

Reference