• Elasticsearch语法知多少之Multi_match query


    目录

    目标

    ES版本信息

    官方文档

    新增测试数据

    基本语法实战

    基本格式

    通配符匹配多个字段

    逻辑操作符

    设置评分权重

    multi_match多种类型实战

    best_fields最佳字段(默认)

    most_fields最多字段

    跨字段匹配


    目标

    掌握多匹配查询,包含对多匹配查询的类型分析和应用。


    ES版本信息

    7.17.5


    官方文档

    Multi-match queryhttps://www.elastic.co/guide/en/elasticsearch/reference/7.9/query-dsl-multi-match-query.html


    新增测试数据

    1. PUT /boss_db
    2. {
    3. "settings": {
    4. "index": {
    5. "analysis.analyzer.default.type": "ik_max_word"
    6. }
    7. }
    8. }
    9. PUT /boss_db/_bulk
    10. {"index":{"_id":"1"}}
    11. {"company":"星耀科技有限公司","min_num":0,"max_num":20,"province":"广东省","city":"深圳市","county":"南山区","post":"前端开发实习生","min_salary":10,"max_salary":16,"qualification":"本科","min_work_time":3,"max_work_time":5,"skill":["html","css","vue","js"]}
    12. {"index":{"_id":"2"}}
    13. {"company":"恒和科技有限公司","min_num":100,"max_num":500,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA开发工程师","min_salary":20,"max_salary":30,"qualification":"硕士","min_work_time":1,"max_work_time":3,"skill":["k8s","springboot","mybatis","微服务"]}
    14. {"index":{"_id":"3"}}
    15. {"company":"天心科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA架构师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["mybatis","spring","kafka","微服务"]}
    16. {"index":{"_id":"4"}}
    17. {"company":"黄河科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["es","mysql","分布式","soa"]}
    18. {"index":{"_id":"5"}}
    19. {"company":"长江科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"资深大数据开发工程师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":0,"max_work_time":5,"skill":["redis","kafka","mq","数据结构"]}
    20. {"index":{"_id":"6"}}
    21. {"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发","min_salary":20,"max_salary":30,"qualification":"大专","min_work_time":0,"max_work_time":5,"skill":["html","css","js","vue"]}
    22. {"index":{"_id":"7"}}
    23. {"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5}
    24. {"index":{"_id":"8"}}
    25. {"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"大数据实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5,"skill":["电商","spring","容器技术","微服务技术"]}
    26. {"index":{"_id":"9"}}
    27. {"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"JAVA实习生","min_salary":30,"max_salary":60,"qualification":"本科","min_work_time":0,"max_work_time":5,"skill":["数据结构","k8s","云原生技术","电商"]}
    28. PUT /blog_db
    29. {
    30. "settings": {
    31. "index": {
    32. "analysis.analyzer.default.type": "ik_max_word"
    33. }
    34. }
    35. }
    36. PUT /blog_db/_bulk
    37. {"index":{"_id":"1"}}
    38. {"title":"kafka入门手册","content":"kafka命令、集群、优化"}
    39. {"index":{"_id":"2"}}
    40. {"title":"kafka命令手册","content":"命令详情、命令实战"}

    基本语法实战

    基本格式

    需求:例如:在招聘网搜索栏中的关键词匹配职位和公司。这里分别输入"天心"和"大数据"进行搜索。

    1. GET boss_db/_search
    2. {
    3. "query": {
    4. "multi_match" : {
    5. "query": "天心",
    6. "fields": [ "company", "post" ]
    7. }
    8. }
    9. }
    10. GET boss_db/_search
    11. {
    12. "query": {
    13. "multi_match" : {
    14. "query": "大数据",
    15. "fields": [ "company", "post" ]
    16. }
    17. }
    18. }

    通配符匹配多个字段

    需求:凡是字段名称包含"work_time"的都作为匹配字段。

    1. GET boss_db/_search
    2. {
    3. "query": {
    4. "multi_match" : {
    5. "query": 5,
    6. "fields": [ "*work_time" ]
    7. }
    8. }
    9. }

    逻辑操作符

    需求一:搜索条件是"前端开发实习生",要求所有分词都匹配。

    1. GET boss_db/_search
    2. {
    3. "query": {
    4. "multi_match" : {
    5. "query": "前端开发实习生",
    6. "fields": [ "post" ]
    7. , "operator": "and"
    8. }
    9. }
    10. }

    需求二:搜索条件是"前端开发实习生",要求只要有分词匹配就符合条件。

    1. GET boss_db/_search
    2. {
    3. "query": {
    4. "multi_match" : {
    5. "query": "前端开发实习生",
    6. "fields": [ "post" ]
    7. , "operator": "or"
    8. }
    9. }
    10. }

    设置评分权重

    需求:关键词为"大数据",要求匹配字段是公司和职位,且评分要求职位字段权重大于公司字段权重。

    1. #不处理权重
    2. GET boss_db/_search
    3. {
    4. "query": {
    5. "multi_match" : {
    6. "query": "大数据",
    7. "fields": ["company", "post" ]
    8. }
    9. }
    10. }
    11. #职位评分分数乘以4。
    12. GET boss_db/_search
    13. {
    14. "query": {
    15. "multi_match" : {
    16. "query": "大数据",
    17. "fields": ["company", "post^4" ]
    18. }
    19. }
    20. }

    multi_match多种类型实战

    best_fields最佳字段(默认)

    作用:从所有字段被搜索的字段中找到最重要的字段。比如:关键词为"棕色的狐狸";a字段包含棕色的狐狸,b字段只包含棕色的,c字段只包含狐狸。此时ES认为a字段是最佳字段。tie_breaker的取值的范围是[1,0],默认值为0,即只考虑最佳字段的分数。如果对它进行设置:

    1. 设置0表示:总分=最佳字段的分数。
    2. 设置0
    3. 设置tie_breaker=1表示:所有字段分数权重一样,相当于没有最佳字段。总分=所有字段相加。

    需求:搜索关键词为"kafka命令",同时匹配标题和内容,优先标题权重。

    分析:如果不设置tie_breaker,根据关键词"kafka命令"匹配,两个文档的最终得分相等,因为id小的排在前面。但是按照业务来看,明显id=2的文档更符合逻辑,所以这里需要将其他字段的分数也算进来一部分即可,这里我设置算进来0.1倍分数。

    1. GET /blog_db/_search
    2. {
    3. "query": {
    4. "multi_match": {
    5. "query": "kafka命令",
    6. "type": "best_fields",
    7. "fields": [
    8. "title",
    9. "content"
    10. ],
    11. "tie_breaker": 0.1
    12. }
    13. }
    14. }

    most_fields最多字段

    相当于best_fields类型,tie_breaker属性设置为1的效果。说明该类型更适合处理字段评分权重相同的场景。这里不做演示,具体同上,tie_breaker设置为1的情况。


    跨字段匹配

    需求:搜索公司所在区县是"天河区",且应聘学历为"硕士"的数据。

    1. GET boss_db/_search
    2. {
    3. "query": {
    4. "multi_match": {
    5. "query": "天河区硕士",
    6. "type": "cross_fields",
    7. "fields": [
    8. "county",
    9. "qualification"
    10. ],
    11. "operator": "and"
    12. }
    13. }
    14. }

    分析所有分词必须至少出现在一个字段中,文档才能匹配。它与copy_to类似,但是copy_to需要额外存储,而cross_fields方式不需要额外存储且可以设置字段权重。个人觉得这种方式更适合在查询地名和英文姓名使用。

  • 相关阅读:
    k8s Pod基础概念
    openGaussDatakit让运维如丝般顺滑!
    python调用seafile接口上传文件到seafile
    打破文件锁限制,以存储力量助力企业增长新动力
    2023年【安全生产监管人员】考试题及安全生产监管人员找解析
    FSCTF2023-Reverse方向题解WP。学习贴
    LLM系列-大模型技术汇总
    HashMap 哈希碰撞、负载因子、插入方式、扩容倍数
    Linux(Ubuntu)用户与用户组(入门必看)
    Cy5.5 N-羟基琥珀酰亚胺酯,Cy5.5 nhs ester,CAS:1469277-96-0
  • 原文地址:https://blog.csdn.net/qq_39706570/article/details/126275796