基于文章 Elasticsearch实战(二)—高级查询语法使用 前面的文章,我们已经学习了ES基本使用及高级语法
现在问一个问题, 百度如何实现 我广告优先展示呢?或者淘宝页面中我搜索一个 手机, 淘宝电商搜索是如何排序的?手机有很多属性,比如手机的销售量,手机广告商的投钱金额,手机销售的评价星级,手机的单价,手机剩余库存等等,在上述排序元素中,是如何排序的?我来告诉你
empId:员工id, salary 表示薪资, deptName:部门, address:地址
POST /testboost/_bulk
{"index":{"_id": 1}}
{"empId" : "111","name" : "员工1","age" : 20,"sex" : "男","mobile" : "19000001111","salary":1333,"deptName" : "技术部","address" : "湖北省武汉市洪山区光谷大厦"}
{"index":{"_id": 2}}
{"empId" : "222","name" : "员工2","age" : 25,"sex" : "男","mobile" : "19000002222","salary":15963,"deptName" : "销售部","address" : "湖北省武汉市江汉路"}
{"index":{"_id": 3}}
{ "empId" : "333","name" : "员工3","age" : 30,"sex" : "男","mobile" : "19000003333","salary":20000,"deptName" : "技术部","address" : "湖北省武汉市经济开发区"}
{"index":{"_id": 4}}
{"empId" : "444","name" : "员工4","age" : 20,"sex" : "女","mobile" : "19000004444","salary":5600,"deptName" : "销售部","address" : "湖北省武汉市沌口开发区"}
{"index":{"_id": 5}}
{ "empId" : "555","name" : "员工5","age" : 20,"sex" : "男","mobile" : "19000005555","salary":9665,"deptName" : "测试部","address" : "湖北省武汉市东湖隧道"}
{"index":{"_id": 6}}
{"empId" : "666","name" : "员工6","age" : 30,"sex" : "女","mobile" : "19000006666","salary":30000,"deptName" : "技术部","address" : "湖北省武汉市江汉路"}
{"index":{"_id": 7}}
{"empId" : "777","name" : "员工7","age" : 60,"sex" : "女","mobile" : "19000007777","salary":52130,"deptName" : "测试部","address" : "湖北省黄冈市边城区"}
{"index":{"_id": 8}}
{"empId" : "888","name" : "员工8","age" : 19,"sex" : "女","mobile" : "19000008888","salary":60000,"deptName" : "技术部","address" : "湖北省武汉市江汉大学"}
{"index":{"_id": 9}}
{"empId" : "999","name" : "员工9","age" : 40,"sex" : "男","mobile" : "19000009999","salary":23000,"deptName" : "销售部","address" : "河南省郑州市郑州大学"}
{"index":{"_id": 10}}
{"empId" : "101010","name" : "张湖北","age" : 35,"sex" : "男","mobile" : "19000001010","salary":18000,"deptName" : "测试部","address" : "湖北省武汉市东湖高新"}
{"index":{"_id": 11}}
{"empId" : "111111","name" : "王河南","age" : 61,"sex" : "男","mobile" : "19000001011","salary":10000,"deptName" : "销售部","address" : "河南省开封市河南大学"}
{"index":{"_id": 12}}
{"empId" : "121212","name" : "张大学","age" : 26,"sex" : "女","mobile" : "19000001012","salary":1321,"deptName" : "测试部","address" : "河南省开封市河南大学"}
{"index":{"_id": 13}}
{"empId" : "131313","name" : "李江汉","age" : 36,"sex" : "男","mobile" : "19000001013","salary":1125,"deptName" : "销售部","address" : "河南省郑州市二七区"}
{"index":{"_id": 14}}
{"empId" : "141414","name" : "王技术","age" : 45,"sex" : "女","mobile" : "19000001014","salary":6222,"deptName" : "测试部","address" : "河南省郑州市金水区"}
{"index":{"_id": 15}}
{"empId" : "151515","name" : "张测试","age" : 18,"sex" : "男","mobile" : "19000001015","salary":20000,"deptName" : "技术部","address" : "河南省郑州高新开发区"}
现在想找 地址:湖北省 且 技术部 或者 销售部的 人
get /testboost/_search
{
"query":{
"bool": {
"must": [
{
"match_phrase": {
"address": "湖北省"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"deptName": "技术部"
}
},
{
"match_phrase": {
"deptName": "销售部"
}
}
]
}
}
]
}
}
}
可以查看查询结果 ,每个员工查询出来的结果都携带了一个分数,然后排序是默认按照分数排序的
{员工2 :score :3.2875974}
{员工8 :score : 3.2417724}
{员工3 :score : 3.1995492}
现在想找 地址:湖北省 且 技术部 或者 销售部的 人 且 我想让 销售部的人 分数较大 优先展示,技术部的人后面展示,分数较小
这种场景就可以用 boost 权重来控制 , 我现在给销售部 加权 ,boost变为 5,技术部默认 boost就是1 看下分数是否改变
get /testboost/_search
{
"query":{
"bool": {
"must": [
{
"match_phrase": {
"address": "湖北省"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"deptName":{
"query": "技术部",
"boost":1
}
}
},
{
"match_phrase": {
"deptName":{
"query": "销售部",
"boost":5
}
}
}
]
}
}
]
}
}
}
可以查看查询结果 ,每个员工查询出来的结果都携带了一个分数,销售部加权 后 分数较高,优先排名,技术部排名靠后
{员工2 :score :11.95731}
{员工8 :score : 11.869268}
{员工3 :score : 3.2875974}
多shard分片为什么会不精确?
因为多shard分片的情况导致的,相关度计算是根据每一个分片上存储的数据进行计算的,而且倒排索引计算规则是 计算所有的文档计算TFIDF来计算权重的
但是如果数据量足够多,概率性问题就会避免,就不需要考虑这个问题
如果开发测试环境中,一般通过设置 number_of_shards:1 只有一个分片来操作测试环境
至此 我们已经能够 通过 控制查询 字段的 权重 来控制搜索结果 相关度排序 处理