本章为推荐模型复现第四章,使用torch_rechub框架进行模型搭建,主要介绍推荐系统召多任务模型ESMM、MMOE,包括结构讲解与代码实战,参考其他文章。
推荐方向资料推荐:
\underbrace{p(y=1, z=1 | x)}_{pCTCVR}=\underbrace{p(y=1 | x)}_{pCTR} \times \underbrace{p(z=1 | y=1, x)}_{pCVR}pCTCVRp(y=1,z=1∣x)=pCTRp(y=1∣x)×pCVRp(z=1∣y=1,x)
其中xx表示曝光,yy表示点击,zz表示转化
主任务和辅助任务共享特征,并利用CTCVR和CTR的label
构造损失函数:
解决样本选择偏差:在训练过程中,模型只需要预测pCTCVR和pCTR,即可更新参数,由于pCTCVR和pCTR的数据是基于完整样本空间提取的,故根据公式,可以解决pCVR的样本选择偏差
解决数据稀疏:使用共享的embedding层,使得CVR子任务也能够从只展示没点击的样本中学习,可以缓解训练数据稀疏的问题
- import torch
- import torch.nn.functional as F
- from torch_rechub.basic.layers import MLP, EmbeddingLayer
- from tqdm import tqdm
-
- class ESMM(torch.nn.Module):
- def __init__(self, user_features, item_features, cvr_params, ctr_params):
- super().__init__()
- self.user_features = user_features
- self.item_features = item_features
- self.embedding = EmbeddingLayer(user_features + item_features)
- self.tower_dims = user_features[0].embed_dim + item_features[0].embed_dim
- # 构建CVR和CTR的双塔
- self.tower_cvr = MLP(self.tower_dims, **cvr_params)
- self.tower_ctr = MLP(self.tower_dims, **ctr_params)
-
- def forward(self, x):
- embed_user_features = self.embedding(x, self.user_features,
- squeeze_dim=False).sum(dim=1)
- embed_item_features = self.embedding(x, self.item_features,
- squeeze_dim=False).sum(dim=1)
- input_tower = torch.cat((embed_user_features, embed_item_features), dim=1)
- cvr_logit = self.tower_cvr(input_tower)
- ctr_logit = self.tower_ctr(input_tower)
- cvr_pred = torch.sigmoid(cvr_logit)
- ctr_pred = torch.sigmoid(ctr_logit)
-
- # 计算pCTCVR = pCTR * pCVR
- ctcvr_pred = torch.mul(cvr_pred, cvr_pred)
-
- ys = [cvr_pred, ctr_pred, ctcvr_pred]
- return torch.cat(ys, dim=1)
2.2.1 MOE模型(混合专家模型)
Expert
汇总输出,通过门控网络机制(注意力网络)得到每个Expert
的权重2.2.2 MMOE模型
Expert
任务都有一个门控网络Expert
组合- import torch
- import torch.nn as nn
-
- from torch_rechub.basic.layers import MLP, EmbeddingLayer, PredictionLayer
-
- class MMOE(torch.nn.Module):
- def __init__(self, features, task_types, n_expert, expert_params, tower_params_list):
- super().__init__()
- self.features = features
- self.task_types = task_types
- # 任务数量
- self.n_task = len(task_types)
- self.n_expert = n_expert
- self.embedding = EmbeddingLayer(features)
- self.input_dims = sum([fea.embed_dim for fea in features])
- # 每个Expert对应一个门控
- self.experts = nn.ModuleList(
- MLP(self.input_dims, output_layer=False, **expert_params) for i in range(self.n_expert))
- self.gates = nn.ModuleList(
- MLP(self.input_dims, output_layer=False, **{
- "dims": [self.n_expert],
- "activation": "softmax"
- }) for i in range(self.n_task))
- # 双塔
- self.towers = nn.ModuleList(MLP(expert_params["dims"][-1], **tower_params_list[i]) for i in range(self.n_task))
- self.predict_layers = nn.ModuleList(PredictionLayer(task_type) for task_type in task_types)
-
- def forward(self, x):
- embed_x = self.embedding(x, self.features, squeeze_dim=True)
- expert_outs = [expert(embed_x).unsqueeze(1) for expert in self.experts]
- expert_outs = torch.cat(expert_outs, dim=1)
- gate_outs = [gate(embed_x).unsqueeze(-1) for gate in self.gates]
-
- ys = []
- for gate_out, tower, predict_layer in zip(gate_outs, self.towers, self.predict_layers):
- expert_weight = torch.mul(gate_out, expert_outs)
- expert_pooling = torch.sum(expert_weight, dim=1)
- # 计算双塔
- tower_out = tower(expert_pooling)
- # logit -> proba
- y = predict_layer(tower_out)
- ys.append(y)
- return torch.cat(ys, dim=1)
本次任务,主要介绍了ESSM和MMOE的多任务学习模型原理和代码实践:
Expert
任务都有一个门控网络,下层是MOE基本模型,上层是双塔模型,满足各个任务在Expert
组合选择上的解耦性,具备灵活的参数共享、训练快速收敛等特点。本文参考: