• 【论文阅读 CIKM‘2021】Learning Multiple Intent Representations for Search Queries


    Original Paper

    Learning Multiple Intent Representations for Search Queries

    More related papers can be found in :

    Motivation

    The typical use of representation models has a major limitation in that they generate only a single representation for a query, which may have multiple intents or facets.

    • propose NMIR(Neural Multiple Intent Representations) to support multiple intent representations for each query

    Method

    Task Description and Problem Formulation

    • training query set: Q = { q 1 , ⋯   , q n } Q = \{q_1,\cdots,q_n\} Q={q1,,qn}
    • D i = d i 1 , ⋯   , d i m D_i = {d_{i1},\cdots,d_{im}} Di=di1,,dim be the top m retrieved documents in response to the query q i q_i qi
    • F i = { f i 1 , ⋯   , f i k } F_i=\{f_{i1},\cdots,f_{ik}\} Fi={fi1,,fik} denote the set of all textual intent descriptions associated with the query q i q_i qi
      • k i k_i ki is the number of query intents

    NMIR Framework: A High-Level Overview

    • one straightforward solution:
      • using an encoder-decoder architecture
        • input: query q i q_i qi
        • output: generates multiple query intent descriptions of the query by taking the top k i k_i ki most likely predictions
      • drawback: These generations are often synonyms or refer to the same concept
    • another straightforward solution:
      • task as a sequence-to-sequence problem
        • input: query q i q_i qi
        • output: generate all the query intent descriptions concatenated with each other(like translation)
      • drawback:
        • different intent representations are not distinguishable in the last layer of the model.
        • most existing effective text encoding models are not able to represent long sequences of tokens, such as a concatenation of the top 𝑚 retrieved documents

    NMIR Framework:

    • 𝜙 (·) and 𝜓 (·) denote a text encoder and decoder pair

    Step1: NMIR assigns each learned document representation to one of the query intent descriptions f i j f_ij fij ∈ 𝐹𝑖 using a document-intent matching algorithm 𝛾:

    在这里插入图片描述

    • C i ∗ C_i^* Ci is a set of documents and each C i j ∗ C_{ij}^* Cij is a set of documents form D i D_i Di that are assigned to f i j f_{ij} fij by 𝛾.

    Step2: NMIR then transforms the encoded general query representation to its intent representations through a query intent encoder 𝜁.

    • the representation for the j t h j^{th} jth​ query intent is obtained using 𝜁 (𝑞𝑖 , C i j ∗ C_{ij}^* Cij ;𝜙).

    Train: training for a mini-batch 𝑏 is based on a gradient descent-based minimization:

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    • q i j ∗ q_{ij}^* qij​ is a concatenation of the query string, the first 𝑗 −1 intent descriptions, and 𝑘𝑖 − 𝑗 mask tokens
      • given the associated cluster C i j ∗ C_{ij}^* Cij and the encoded query text plus the past 𝑗−1 intent descriptions.
      • helps the model avoid generating the previous intent representations and learn widely distributed representations

    where L C E L_{CE} LCE is the cross-entropy loss

    在这里插入图片描述

    • f i j t f_{ijt} fijt is the t t h t^{th} tth token in the given intent description f i j f_{ij} fij.

    Inference: q i j ∗ q_{ij}^* qij s are constructed differently.

    • first feed“𝑞𝑖 …” to the model and apply beam search to the decoder’s output to obtain
      the first intent description f i 1 f_{i1} fi1'.
    • then use the model’s output to iteratively create the input for the next step “𝑞𝑖 f i 1 f_{i1} fi1’ …”and repeat this process

    Model Implementation and Training

    在这里插入图片描述

    Figure1(a) represents the model architecture.

    • use Transformer encoder and decoder architectures(pre-trained BART) for implementing 𝜙 and𝜓, respectively

    The intent encoding component 𝜁 : use N ′ N' N layers Guided Transformer model

    • Guided Transformer is used for influencing an input representation by the guidance of some external information.
      • we use 𝜙 ( q i j q_{ij} qij ) as the input representation and 𝜙 (𝑑) :∀𝑑 ∈ C i j ∗ C_{ij}^* Cij as the external information.

    The document-intent matching component 𝛾 : develop a clustering algorithm

    • encodes all the top retrieved documents and creates k i k_i ki clusters, using a clustering algorithm(use K-Means).

      在这里插入图片描述

      • C i j = { C i 1 , ⋯   , C i k i } C_{ij} = \{C_{i1},\cdots,C_{ik_i}\} Cij={Ci1,,Ciki}​ denotes a set of clusters and each C i j C_{ij} Cij contains all the documents in the 𝑗 th cluster associated with the query 𝑞𝑖 .
      • M i = { μ i 1 , ⋯ μ i k i } M_i=\{\mu_{i1},\cdots\mu_{ik_i}\} Mi={μi1,μiki}is a set of all cluster centroids such that μ i j \mu_{ij} μij = centroid( C i j C_{ij} Cij).
    • K-Means requires the number of clusters as input.

      • consider two cases at inference time
        • assume the number of clusters is equal to a tuned hyper-parameter 𝑘∗ for all queries
        • replace the K-Means algorithm by a non-parametric version of K-Means
    • Issue: The component 𝛾 requires a one-to-one assignment between the cluster centroids and the query intents in the training data, all clusters may be assigned to a single most dominant query intent. So we use the intent identification function I:

      • my view: the problem is how to assign centroids to query intents after clustering.

      在这里插入图片描述

    • output:

      在这里插入图片描述

    𝛾 is not differentiable and cannot be part of the network for gradient descent-based optimization. We move it to an asynchronous process as figure1(b) below:

    在这里插入图片描述

    Asynchronous training: use asynchronous training method to speed up(the clustering of document representations is an efficiency bottleneck) described as figure1(b)

    Data

    • training data: We follow a weak supervision solution based on the MIMIC-Click dataset, recently released by Zamani et al. MIMICS: A Large-Scale Data Collection for Search Clarification
    • evaluation data: Qulac dataset
  • 相关阅读:
    024 - STM32学习笔记 - 液晶屏控制(一) - LTDC与DMA2D初始
    Docker容器-------数据卷和数据卷容器及cgroups资源限制
    04_tcp
    tcpdump进行DNS抓包
    SD-WAN组网相较传统组网的优点
    C语言程序设计--火车订票系统
    AWS认证SAA-C03每日一题
    全平台数据管理工具 DataCap 1.1.0.20221115 发布
    推荐系统的基本概念
    SpringBoot 开发 -- JWT 认证教程
  • 原文地址:https://blog.csdn.net/qq_52852138/article/details/128195107