目录
掌握多匹配查询,包含对多匹配查询的类型分析和应用。
7.17.5
- PUT /boss_db
- {
- "settings": {
- "index": {
- "analysis.analyzer.default.type": "ik_max_word"
- }
- }
- }
-
- PUT /boss_db/_bulk
- {"index":{"_id":"1"}}
- {"company":"星耀科技有限公司","min_num":0,"max_num":20,"province":"广东省","city":"深圳市","county":"南山区","post":"前端开发实习生","min_salary":10,"max_salary":16,"qualification":"本科","min_work_time":3,"max_work_time":5,"skill":["html","css","vue","js"]}
- {"index":{"_id":"2"}}
- {"company":"恒和科技有限公司","min_num":100,"max_num":500,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA开发工程师","min_salary":20,"max_salary":30,"qualification":"硕士","min_work_time":1,"max_work_time":3,"skill":["k8s","springboot","mybatis","微服务"]}
- {"index":{"_id":"3"}}
- {"company":"天心科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA架构师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["mybatis","spring","kafka","微服务"]}
- {"index":{"_id":"4"}}
- {"company":"黄河科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"广州市","county":"天河区","post":"JAVA","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":3,"max_work_time":5,"skill":["es","mysql","分布式","soa"]}
- {"index":{"_id":"5"}}
- {"company":"长江科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"资深大数据开发工程师","min_salary":40,"max_salary":50,"qualification":"博士","min_work_time":0,"max_work_time":5,"skill":["redis","kafka","mq","数据结构"]}
- {"index":{"_id":"6"}}
- {"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发","min_salary":20,"max_salary":30,"qualification":"大专","min_work_time":0,"max_work_time":5,"skill":["html","css","js","vue"]}
- {"index":{"_id":"7"}}
- {"company":"黄山科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"前端开发实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5}
- {"index":{"_id":"8"}}
- {"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"大数据实习生","min_salary":10,"max_salary":13,"qualification":"不限","min_work_time":0,"max_work_time":5,"skill":["电商","spring","容器技术","微服务技术"]}
- {"index":{"_id":"9"}}
- {"company":"银河大数据科技有限公司","min_num":2000,"max_num":5000,"province":"广东省","city":"深圳市","county":"龙岗区","post":"JAVA实习生","min_salary":30,"max_salary":60,"qualification":"本科","min_work_time":0,"max_work_time":5,"skill":["数据结构","k8s","云原生技术","电商"]}
-
- PUT /blog_db
- {
- "settings": {
- "index": {
- "analysis.analyzer.default.type": "ik_max_word"
- }
- }
- }
-
- PUT /blog_db/_bulk
- {"index":{"_id":"1"}}
- {"title":"kafka入门手册","content":"kafka命令、集群、优化"}
- {"index":{"_id":"2"}}
- {"title":"kafka命令手册","content":"命令详情、命令实战"}
需求:例如:在招聘网搜索栏中的关键词匹配职位和公司。这里分别输入"天心"和"大数据"进行搜索。
- GET boss_db/_search
- {
- "query": {
- "multi_match" : {
- "query": "天心",
- "fields": [ "company", "post" ]
- }
- }
- }
-
- GET boss_db/_search
- {
- "query": {
- "multi_match" : {
- "query": "大数据",
- "fields": [ "company", "post" ]
- }
- }
- }
需求:凡是字段名称包含"work_time"的都作为匹配字段。
- GET boss_db/_search
- {
- "query": {
- "multi_match" : {
- "query": 5,
- "fields": [ "*work_time" ]
- }
- }
- }
需求一:搜索条件是"前端开发实习生",要求所有分词都匹配。
- GET boss_db/_search
- {
- "query": {
- "multi_match" : {
- "query": "前端开发实习生",
- "fields": [ "post" ]
- , "operator": "and"
- }
- }
- }
需求二:搜索条件是"前端开发实习生",要求只要有分词匹配就符合条件。
- GET boss_db/_search
- {
- "query": {
- "multi_match" : {
- "query": "前端开发实习生",
- "fields": [ "post" ]
- , "operator": "or"
- }
- }
- }
需求:关键词为"大数据",要求匹配字段是公司和职位,且评分要求职位字段权重大于公司字段权重。
- #不处理权重
- GET boss_db/_search
- {
-
- "query": {
- "multi_match" : {
- "query": "大数据",
- "fields": ["company", "post" ]
- }
- }
- }
-
- #职位评分分数乘以4。
- GET boss_db/_search
- {
-
- "query": {
- "multi_match" : {
- "query": "大数据",
- "fields": ["company", "post^4" ]
- }
- }
- }
作用:从所有字段被搜索的字段中找到最重要的字段。比如:关键词为"棕色的狐狸";a字段包含棕色的狐狸,b字段只包含棕色的,c字段只包含狐狸。此时ES认为a字段是最佳字段。tie_breaker的取值的范围是[1,0],默认值为0,即只考虑最佳字段的分数。如果对它进行设置:
需求:搜索关键词为"kafka命令",同时匹配标题和内容,优先标题权重。
分析:如果不设置tie_breaker,根据关键词"kafka命令"匹配,两个文档的最终得分相等,因为id小的排在前面。但是按照业务来看,明显id=2的文档更符合逻辑,所以这里需要将其他字段的分数也算进来一部分即可,这里我设置算进来0.1倍分数。
- GET /blog_db/_search
- {
- "query": {
- "multi_match": {
- "query": "kafka命令",
- "type": "best_fields",
- "fields": [
- "title",
- "content"
- ],
- "tie_breaker": 0.1
- }
- }
- }
相当于best_fields类型,tie_breaker属性设置为1的效果。说明该类型更适合处理字段评分权重相同的场景。这里不做演示,具体同上,tie_breaker设置为1的情况。
需求:搜索公司所在区县是"天河区",且应聘学历为"硕士"的数据。
- GET boss_db/_search
- {
- "query": {
- "multi_match": {
- "query": "天河区硕士",
- "type": "cross_fields",
- "fields": [
- "county",
- "qualification"
- ],
- "operator": "and"
- }
- }
- }
分析:所有分词必须至少出现在一个字段中,文档才能匹配。它与copy_to类似,但是copy_to需要额外存储,而cross_fields方式不需要额外存储且可以设置字段权重。个人觉得这种方式更适合在查询地名和英文姓名使用。