一款强大的推荐系统框架,可以处理用户/产品冷启动,之前介绍了数据处理和冷启动方法,这里补充下原理和接口。
lightfm.LightFM(no_components=10, k=5, n=10, learning_schedule=‘adagrad’, loss=‘logistic’, learning_rate=0.05, rho=0.95, epsilon=1e-06, item_alpha=0.0, user_alpha=0.0, max_sampled=10, random_state=None)
参数:
loss选择:
logitstic:useful when both positive (1) and negative (-1) interactions are present
bpr: Useful when only positive interactions are present and optimising ROC AUC is desired
warp:Useful when only positive interactions are present and optimising the top of the recommendation list (precision@k) is desired.
fit(interactions, user_features=None, item_features=None, sample_weight=None, epochs=1, num_threads=1, verbose=False)
得到的是item的latent表示,维度应该是(n_item, n_embedding),类型是array
得到的是user的latent表示,维度应该是(n_user, n_embedding),类型是array
predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1)
此方法需要注意一点: 源码中assert: len(user_ids == item_ids).
得到的是用户和产品的距离,不是0,1的分数,不能直接用来做ctr或者cvr预测。
pred = model.predict_rank(test_interactions,
train_interactions=train_interactions)
预测出来的值大部分为0,很奇怪
array([0., 0., 0., 0., 0., 0., 0., 0., 4., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
看了下源码,ranks矩阵初始值就是0.
ranks = sp.csr_matrix(
(np.zeros_like(test_interactions.data),
test_interactions.indices,
test_interactions.indptr,
),
shape=test_interactions.shape,
)
# ranks本来就开始全部赋值为0
lightfm_data = self._get_lightfm_data()
predict_ranks(
CSRMatrix(item_features),
CSRMatrix(user_features),
CSRMatrix(test_interactions),
CSRMatrix(train_interactions),
ranks.data,
lightfm_data,
num_thread
Performs best when only a handful of interactions need to be evaluated per user. If you need to compute predictions for many items for every user, use the predict method instead. 在全量评估的时候,官网建议使用predict。
可以使用pickle进行模型的保存加载
import pickle
with open('savefile.pickle', 'wb') as fle:
pickle.dump(model, fle, protocol=pickle.HIGHEST_PROTOCOL)
with open('savefile.pickle', 'rb') as fle:
model_loaded = pickle.load(fle)
test_rank = model_loaded.predict_rank(
test_interactions,
train_interactions=interactions,
user_features=user_features_matrix)