• Elasticsearch语法知多少之Match query


    目录

    目标

    ES版本信息

    官方文档

    相关术语

    创建相关的索引和文档(数据用于实战案例)

    创建索引

    索引文档

    Match query常见参数实战

    基本语法

    analyzer(指定分词器查询)

    operator(解释查询条件的布尔逻辑)

    minimum_should_match(最少匹配数)

    fuzzy(模糊搜索)


    目标

    掌握匹配查询,本文会列举各种常见的案例,通过这些案例来熟悉匹配查询各个参数的功能和使用方法。


    ES版本信息

    7.17.5


    官方文档

    Match queryhttps://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html


    相关术语

    Match query

    即匹配查询。返回与提供的文本、数字、日期或布尔值匹配的文档。在匹配之前分析提供的文本。匹配查询是执行全文搜索的标准查询,包括模糊匹配选项。


    创建相关的索引和文档(数据用于实战案例)

    创建索引

    1. PUT /student_db
    2. {
    3. "settings": {
    4. "index": {
    5. "analysis.analyzer.default.type": "ik_max_word"
    6. }
    7. }
    8. }
    9. PUT /address_list
    10. {
    11. "mappings": {
    12. "properties": {
    13. "province": {
    14. "type": "text",
    15. "copy_to": "fullAddress"
    16. },
    17. "city": {
    18. "type": "text",
    19. "copy_to": "fullAddress"
    20. },
    21. "county": {
    22. "type": "text",
    23. "copy_to": "fullAddress"
    24. }
    25. }
    26. },
    27. "settings": {
    28. "index": {
    29. "analysis.analyzer.default.type": "ik_max_word"
    30. }
    31. }
    32. }

    索引文档

    1. PUT /student_db/_bulk
    2. {"index":{"_id":"1"}}
    3. {"province":"湖南省","city":"长沙市","county":"天心区","describe":"侠客岛服务员A。","stu_id":"10001","stu_name":"张三","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":{"Math":{"value":98.5,"level":"优"},"English":{"value":87.5,"level":"良"}}}
    4. {"index":{"_id":"2"}}
    5. {"province":"湖南省","city":"长沙市","county":"芙蓉区","describe":"侠客岛服务员B。","stu_id":"10002","stu_name":"李四","age":12,"sex":true,"birthday":"1998-01-01","hobby":["唱歌","跳舞","游泳"],"examination_results":{"English":{"value":97.5,"level":"优"},"Chinese":{"value":85.5,"level":"良"}}}
    6. {"index":{"_id":"3"}}
    7. {"province":"湖北省","city":"武汉市","county":"江夏区","describe":"会九阳神功、乾坤大挪移、圣火令武功、太极拳,太极剑等武功。","stu_id":"10003","stu_name":"张无忌","age":11,"sex":false,"birthday":"1999-01-01","hobby":["乒乓球","跳舞","游泳"],"examination_results":{"Physics":{"value":77.5,"level":"一般"},"Chinese":{"value":100,"level":"优"}}}
    8. {"index":{"_id":"4"}}
    9. {"province":"湖北省","city":"黄石市","county":"铁山区","describe":"会黯然销魂掌、弹指神功、玉女剑法等武功。","stu_id":"10004","stu_name":"杨过","age":9,"sex":false,"birthday":"2001-01-01","hobby":["乒乓球","唱歌","游泳"],"examination_results":{"Chemistry":{"value":70.5,"level":"一般"},"Chinese":{"value":91.5,"level":"优"}}}
    10. {"index":{"_id":"5"}}
    11. {"province":"广东省","city":"广州市","county":"南沙区","describe":"辽国南院大王,精通降龙十八掌,真正的战神。","stu_id":"10005","stu_name":"萧峰","age":13,"sex":true,"birthday":"1997-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":{"FineArts":{"value":92.5,"level":"优"},"Sports":{"value":91.5,"level":"优"}}}
    12. {"index":{"_id":"6"}}
    13. {"province":"广东省","city":"广州市","county":"南沙区","describe":"精通降龙十八掌,为国为民的侠之大者。","stu_id":"10006","stu_name":"郭靖","age":13,"sex":true,"birthday":"1997-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":{"History":{"value":92.5,"level":"优"},"Chemistry":{"value":91.5,"level":"优"}}}
    14. {"index":{"_id":"7"}}
    15. {"province":"广东省","city":"广州市","county":"白云区","describe":"会降龙十八掌,逍遥派诸多武功。","stu_id":"10007","stu_name":"虚竹","age":14,"sex":false,"birthday":"1996-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":{"History":{"value":90.5,"level":"优"},"Chemistry":{"value":94.5,"level":"优"}}}
    16. {"index":{"_id":"8"}}
    17. {"province":"广东省","city":"广州市","county":"白云区","describe":"会六脉神剑和北冥神功。","stu_id":"10008","stu_name":"段誉","age":14,"sex":false,"birthday":"1996-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":{"History":{"value":90.5,"level":"优"},"Chemistry":{"value":94.5,"level":"优"}}}
    18. {"index":{"_id":"9"}}
    19. {"province":"广东省","city":"广州市","county":"白云区","describe":"以光复大燕国为己任,会斗转星移和参合指。","stu_id":"10009","stu_name":"慕容复","age":15,"sex":false,"birthday":"1995-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":{"History":{"value":90.5,"level":"优"},"Chemistry":{"value":94.5,"level":"优"}}}
    20. {"index":{"_id":"10"}}
    21. {"province":"广东省","city":"广州市","county":"白云区","describe":"斗转星移的创作者。","stu_id":"10010","stu_name":"慕容龙城","age":15,"sex":false,"birthday":"1995-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":{"History":{"value":90.5,"level":"优"}}}
    22. {"index":{"_id":"11"}}
    23. {"province":"北京市","city":"朝阳区","county":"三里屯街道","describe":"会少林七十二绝技,以佛法和慈悲度化慕容博和萧远山,是佛法和武功的集大成者。","stu_id":"10011","stu_name":"扫地僧","age":9,"sex":false,"birthday":"2001-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":{"History":{"value":100,"level":"优"},"Chinese":{"value":100,"level":"优"},"Chemistry":{"value":94.5,"level":"优"},"English":{"value":100,"level":"优"},"Physics":{"value":100,"level":"优"},"Math":{"value":100,"level":"优"}}}
    24. {"index":{"_id":"12"}}
    25. {"province":"湖南省","city":"长沙市","county":"天心区","describe":"九阴真经的作者,武学创作天赋真正的第一人。","stu_id":"10012","stu_name":"黄裳","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":{"Math":{"value":98.5,"level":"优"},"English":{"value":87.5,"level":"良"}}}
    26. {"index":{"_id":"13"}}
    27. {"province":"湖南省","city":"长沙市","county":"天心区","describe":"根据九阴真经创作了九阳神功。","stu_id":"10013","stu_name":"斗酒僧","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":{"Math":{"value":100,"level":"优"},"English":{"value":100,"level":"优"}}}
    28. {"index":{"_id":"14"}}
    29. {"province":"湖南省","city":"长沙市","county":"天心区","describe":"绝技先天功,大器晚成,第一届华山论剑夺得九阴真经。","stu_id":"10014","stu_name":"王重阳","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":{"Math":{"value":100,"level":"优"},"English":{"value":100,"level":"优"}}}
    30. PUT /address_list/_bulk
    31. { "index": { "_id": "1"} }
    32. {"province": "湖南省","city": "长沙市","county":"天心区"}
    33. { "index": { "_id": "2"} }
    34. {"province": "湖南省","city": "长沙市","county":"芙蓉区"}
    35. { "index": { "_id": "3"} }
    36. {"province": "广东省","city": "广州市","county":"白云区"}
    37. { "index": { "_id": "4"} }
    38. {"province": "湖北省","city": "武汉市","county":"江夏区"}
    39. { "index": { "_id": "4"} }
    40. {"province": "内蒙古自治区","city": "呼和浩特","county":"玉泉区"}

    Match query常见参数实战

    基本语法

    需求:全文检索describe字段,匹配值为真经。

    第一步:以ik分词器对真经分词,发现分词结果为:"真经"。

    1. POST _analyze
    2. {
    3. "analyzer": "ik_max_word",
    4. "text": "真经"
    5. }

    第二步:匹配查询。

    1. #方法一
    2. GET /student_db/_search
    3. {
    4. "query": {
    5. "match": {
    6. "describe": "真经"
    7. }
    8. }
    9. }
    10. #方法二
    11. GET /student_db/_search
    12. {
    13. "query": {
    14. "match": {
    15. "describe": {
    16. "query": "真经"
    17. }
    18. }
    19. }
    20. }

    analyzer(指定分词器查询)

    需求:指定标准分词器全文检索describe字段,匹配值为真经。

    第一步:以标准分词器对真经分词,发现分词结果为:"真","经"。

    1. POST _analyze
    2. {
    3. "analyzer": "standard",
    4. "text": "真经"
    5. }

    第二步:指定标准分词器匹配查询。

    1. GET /student_db/_search
    2. {
    3. "query": {
    4. "match": {
    5. "describe": {
    6. "query": "真经",
    7. "analyzer": "standard"
    8. }
    9. }
    10. }
    11. }

    operator(解释查询条件的布尔逻辑)

    需求:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。用AND和OR演示该参数的使用方法。

    第一步:以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。

    1. POST _analyze
    2. {
    3. "analyzer": "ik_max_word",
    4. "text": "湖南天心区"
    5. }

    第二步:对于operator参数分别用AND和OR演来查询。发现用AND查不到数据,因为用ik分词器对fullAddress对应的全地址分词,发现没有一个地址分词以后同时拥有"湖南","南天","天心区","天心","区"。而用OR则可以查到数据,因为OR只要地址分词以后有一个分词在"湖南","南天","天心区","天心","区"就能匹配。注意:该参数默认值为OR。

    1. GET /address_list/_search
    2. {
    3. "query": {
    4. "match": {
    5. "fullAddress": {
    6. "query": "湖南天心区",
    7. "operator": "AND"
    8. }
    9. }
    10. }
    11. }
    12. GET /address_list/_search
    13. {
    14. "query": {
    15. "match": {
    16. "fullAddress": {
    17. "query": "湖南天心区",
    18. "operator": "OR"
    19. }
    20. }
    21. }
    22. }

    minimum_should_match(最少匹配数)

    需求一:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。分别设置最少匹配数量为3、2、1,比较它们最终返回的结果。

    第一步:以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。这里我们可以说分词总数是5个,或者说子句数量是5个。

    1. POST _analyze
    2. {
    3. "analyzer": "ik_max_word",
    4. "text": "湖南天心区"
    5. }

    第二步:查询。发现数字越大,返回的数据越精准;数字越小,返回的数据越多。所以实际生产中需要合理配置该值。

    1. GET /address_list/_search
    2. {
    3. "query": {
    4. "match": {
    5. "fullAddress": {
    6. "query": "湖南天心区",
    7. "minimum_should_match":3
    8. }
    9. }
    10. }
    11. }
    12. GET /address_list/_search
    13. {
    14. "query": {
    15. "match": {
    16. "fullAddress": {
    17. "query": "湖南天心区",
    18. "minimum_should_match":2
    19. }
    20. }
    21. }
    22. }
    23. GET /address_list/_search
    24. {
    25. "query": {
    26. "match": {
    27. "fullAddress": {
    28. "query": "湖南天心区",
    29. "minimum_should_match":1
    30. }
    31. }
    32. }
    33. }

    需求二:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。分别设置最少匹配数量为1<60%、1<59%,比较它们最终返回的结果。

    第一步:根据需求一得出以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。

    第二步:查询。发现minimum_should_match=1<60%查询1个文档,minimum_should_match=1<59%查询2个文档。因为子句数量为5,则既要满足minimum_should_match=1,又要满足minimum_should_match=百分比数。

    1. GET /address_list/_search
    2. {
    3. "query": {
    4. "match": {
    5. "fullAddress": {
    6. "query": "湖南天心区",
    7. "minimum_should_match":"1<60%"
    8. }
    9. }
    10. }
    11. }
    12. GET /address_list/_search
    13. {
    14. "query": {
    15. "match": {
    16. "fullAddress": {
    17. "query": "湖南天心区",
    18. "minimum_should_match":"1<59%"
    19. }
    20. }
    21. }
    22. }

    附录

    官方文档中描述了该参数可以拥有多种类型的值,比如按照分词数的百分比计算,这里列出了使用方法。

    类型取值案例描述
    正整数3分词数量至少匹配3个才符合条件。
    负整数-2minimum_should_match=子句数量+这个负整数。该负数越小,查询到的数据越多。如果这个负整数太小,小于分词总数,则表示minimum_should_match=1。
    正百分比75%符合子句数量的75%则匹配成功,比如子句数量是4,则需要至少有3个分词匹配,该文档才能被匹配;但是子句数量是5,则只需要3个匹配即可,即minimum_should_match=向下取整(子句数量X正百分比)。
    负百分比-25%符合子句数量的(100%-25%)则匹配成功,比如子句数量是4,则需要至少有3个分词匹配,该文档才能被匹配;但是子句数量是5,则只需要4个匹配即可,即minimum_should_match=向上取整(子句数量X(100%+负百分比))。
    组合1<60%见需求二的实现过程。
    多种组合2<60% 9<-4用空格隔开,如果子句数量小于等于2,则每个组合都要匹配,如果数量为3到9个,则需要匹配60%,如果大于9,则需要匹配minimum_should_match=子句数量-4。

    fuzzy(模糊搜索)

    参数

    fuzziness(编辑距离):输入的关键词通过几次操作可以转变为文档中对应的字段的值,这里的操作表示增删改以及相邻字符位置的交换。

    1. #修改"制"变成"治",为1次。
    2. 内蒙古自制区->内蒙古自治区
    3. #新增"治区",为2次。
    4. 内蒙古自->内蒙古自治区
    5. #删除"区",为1次。
    6. 内蒙古自治区区->内蒙古自治区
    7. #交换"治自"为"自治",为1次。
    8. 内蒙古治自区->内蒙古自治区

    默认值为0表示不开启模糊搜索。为1表示允许一次修改,如:文档中字段的值为"内蒙古自治区",此时的搜索条件为"内蒙古古自治区"、"内蒙股自治区"、"内蒙自治区","内蒙古治自区"都可以搜索出该文档,因为搜索条件只经过了一次修改操作。要特别注意

    1. fuzzy最大为2;
    2. 搜索关键词长度=2,不允许存在模糊;
    3. 搜索关键词长度为3-5,允许1次模糊;
    4. 搜索关键词长度大于5,允许2次模糊。
    5. 官方推荐使用"AUTO",即根据情况自动设定。

    prefix_length(前缀长度):模糊搜索时,要求搜索关键词的前缀必须匹配,这里的匹配长度由该参数控制。

    需求一:输入关键词模糊搜索省份字段,通过控制编辑距离和前缀长度熟悉两个参数的使用方法。

    1. GET /address_list/_search
    2. {
    3. "query": {
    4. "fuzzy": {
    5. "province": {
    6. "value":"湖x省",
    7. "fuzziness": 1
    8. }
    9. }
    10. }
    11. }
    12. GET /address_list/_search
    13. {
    14. "query": {
    15. "fuzzy": {
    16. "province": {
    17. "value":"湖x省",
    18. "fuzziness": 1,
    19. "prefix_length":2
    20. }
    21. }
    22. }
    23. }
    24. GET /address_list/_search
    25. {
    26. "query": {
    27. "fuzzy": {
    28. "province": {
    29. "value":"内蒙自治区",
    30. "fuzziness": 1,
    31. "prefix_length":2
    32. }
    33. }
    34. }
    35. }
    36. GET /address_list/_search
    37. {
    38. "query": {
    39. "fuzzy": {
    40. "province": {
    41. "value":"内蒙古字智区",
    42. "fuzziness": 2,
    43. "prefix_length":2
    44. }
    45. }
    46. }
    47. }
    48. GET /address_list/_search
    49. {
    50. "query": {
    51. "fuzzy": {
    52. "province": {
    53. "value":"内蒙古治自区",
    54. "fuzziness": 1,
    55. "prefix_length":2
    56. }
    57. }
    58. }
    59. }
  • 相关阅读:
    Web3 基础设施协议如何尝试获取价值?
    Linux进程间通讯技术
    Java反应式编程(2)
    部署一个自己的GPT客户端[以ChatGPT-Next-Web为例]
    《2020年最新面经》—字节跳动Java社招面试题
    SpringBoot使用RestTemplate远程调用其他服务接口
    【Rust 日报】2022-6-23 Jon Gjengset 的关键字小技巧系列
    软件需求说明书(GB856T-88)
    Windows11 VMware上安装适用于编译Android12源代码的Ubuntu虚拟机
    V8中的快慢属性(图文分解更易理解)
  • 原文地址:https://blog.csdn.net/qq_39706570/article/details/125964698