Elasticsearch教程10】Mapping字段类型之数字Numbers

Elasticsearch Mapping字段类型之数字Numbers

一、种类
二、实验
三、重要参数
- 3.1 `coerce`
- 3.2 `doc_values`和`index`

一、种类

ES的数据类型有很多种，为了提高性能和减少存储空间，选择一个足够用的类型就OK了，没必要选择过长的类型。比如各地人口数量，一般用integer存储足够了，没有必要使用long类型。

类型	说明
`byte`	8位，-128 ~ 127
`short`	16位，-32768 ~ 32767
`integer`	32位，-2³¹ ~ 2³¹-1
`long`	64位，-2⁶³ ~ 2⁶³-1
`unsigned_long`	无符号64位整数，0 ~ 2⁶⁴-1
`float`	单精度、32位、符合IEEE 754标准的浮点数
`double`	双精度、64位、符合IEEE 754标准的浮点数
`half_float`	16位半精度IEEE 754浮点类型
`scaled_float`	缩放类型的的浮点数

二、实验

2.1 插入`正确`的数据

(1)先创建一个新的索引，包含常见的数字类型字段

PUT pigg_test_num
{
  "mappings": {
    "properties": {
      "num_of_byte": {
        "type": "byte"
      },
      "num_of_short": {
        "type": "short"
      },
      "num_of_integer": {
        "type": "integer"
      },
      "num_of_long": {
        "type": "long"
      },
      "num_of_float": {
        "type": "float"
      },
      "num_of_double": {
        "type": "double"
      }
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

(2)插入在字段类型范围内的正确数据

PUT pigg_test_num/_doc/1
{
  "num_of_byte": 127,
  "num_of_short": 32767,
  "num_of_integer": 2147483647,
  "num_of_long": 9223372036854775807,
  "num_of_float": 0.33333,
  "num_of_double": 11111111111111.11111111111111111
}
1
2
3
4
5
6
7
8
9

(3)查看文档的数据

GET pigg_test_num/_search

返回:
{
    "hits":[
        {
            "_index":"pigg_test_num",
            "_type":"_doc",
            "_id":"1",
            "_score":1,
            "_source":{
                "num_of_byte":127,
                "num_of_short":32767,
                "num_of_integer":2147483647,
                "num_of_long":9223372036854776000,
                "num_of_float":0.33333,
                "num_of_double":11111111111111.111
            }
        }
    ]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

2.2 插入`越界`的数据

short的最大值是32767

PUT pigg_test_num/_doc/2
{
  "num_of_byte": 127,
  "num_of_short": 32768
}

返回报错
"reason" : "Numeric value (32768) out of range of Java short..."
1
2
3
4
5
6
7
8

2.3 给`整型`赋值`浮点数`

给long类型赋值浮点数, 虽然能够存储成功,但是已经丢失了精度,所以工作中不能这么用

PUT pigg_test_num/_doc/1
{
  "num_of_long": 9223372036854775807.0001
}
返回
 "_source" : {
    "num_of_long" : 9.223372036854776E18
}
1
2
3
4
5
6
7
8

2.4 给`整型`赋值`数字字符串`

给long类型赋值浮点数字符串, 虽然能够存储成功, 但是存的就是字符串,而不是数字.

PUT pigg_test_num/_doc/1
{
  "num_of_long": "9223372036854775807.0001"
}
返回
"_source" : {
    "num_of_long" : "9223372036854775807.0001"
}
1
2
3
4
5
6
7
8

下面验证存的是字符串而不是数字

#期望给long的值加上2
POST pigg_test_num/_update/1
{
  "script": {
    "source": "ctx._source.num_of_long += 2",
    "lang": "painless"
  }
}
# 返回值却是给字符串拼接加上字符"2"
"_source" : {
    "num_of_long" : "9223372036854775807.00012"
}
1
2
3
4
5
6
7
8
9
10
11
12

虽然ES默认对数字类型允许接收字符串，但是并不推荐你这么做，因为如果你只是展示还好，如果要对它进行数学计算（比如上面在script脚本中进行 +=2），这样不会按你的期望执行计算的。所以对数字类型还是传数字而非数字字符串。

总结: 综合上面实验, 可以知道工作中还是得传正确格式和范围的数字。

三、重要参数

3.1 `coerce`

coerce参数接受true或false，默认是true
为true时，允许数字类型保存数字字符串
unsigned_long类型不能设置coerce

coerce	优点	缺点
true	容错高，不会丢失宝贵的客户数据	进行数学计算容易踩坑
false	严格控制传入的值必须是数字	万一上层传的就是数字字符串，保存失败，数据丢失

所以最好的办法是统一给ES写数据的入口，在Java代码层面，就把数据格式校验和转换好。

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "number_one": {
        "type": "integer"
      },
      "number_two": {
        "type": "integer",
        "coerce": false
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "number_one": "10" 	#这个可以保存成功
}

PUT my-index-000001/_doc/2
{
  "number_two": "10" 	#这个保存失败
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

3.2 `doc_values`和`index`

对doc_values和index的解释可以参考我之前的博客es mapping参数详解

doc_values：为了加快排序、聚合操作，在建立倒排索引的时候，额外增加一个列式存储映射，是一个空间换时间的做法。默认是开启的，对于确定不需要聚合或者排序的字段可以关闭。
index设置为false，表明该字段不能被被检索, 不构建倒排索引，如果查询会报错。

数字类型的doc_values默认是true，当你的数字字段确定不需要参与聚合和排序，可以设置为false。
数字类型的index默认是true，当你确定不需要在数字字段上进行term/terms查询，可以设置为false。

如果本文对您有帮助，就给亚瑟王点个赞👍吧

相关阅读:
鸿蒙原生应用开发-折叠屏、平板设备服务卡片适配
 【HDR】Deep high dynamic range imaging of dynamic scenes
踩坑npm install qrcodejs2和crypto-js
matlab实现神经网络算法,matlab神经网络训练函数
 【SpringCloud学习笔记】RabbitMQ（中）
P4867 Gty的二逼妹子序列(莫队+值域分块)
Python识别二维码的两种方法
 软件测试的一些心得和建议
 李沐：用随机梯度下降来优化人生！
【C语言】文件操作详解
原文地址：https://blog.csdn.net/winterking3/article/details/126607893

Elasticsearch教程10】Mapping字段类型之数字Numbers

Elasticsearch Mapping字段类型之数字Numbers

一、种类

二、实验

2.1 插入正确的数据

2.2 插入越界的数据

2.3 给整型赋值浮点数

2.4 给整型赋值数字字符串