• es7.x Es常用核心知识快捷版1(分词和text和keyword)


    分词

    1.1 分词

    1.1.1 查看分词

    standard标准分析器是将每个字都分出来;

    ik_max_word是最细粒度的分词,将所有可能的词都分出来;

    ik_smart 是最粗粒度的分词

    ik_smart

    优点:特征是粗略快速的将文字进行分词,占用空间小,查询速度快

    缺点:分词的颗粒度大,可能跳过一些重要分词,导致查询结果不全面,查全率低

    ik_max_word

    优点:特征是详细的文字片段进行分词,查询时查全率高,不容易遗漏数据

    缺点:因为分词太过详细,导致有一些无用分词,占用空间较大,查询速度慢standard是ES默认的分词器,"analyzer": "standard"是可以省略的

    1.1.2 几种分词比较

    1.使用 ik_max_word 分词

    ###  请求

    GET    http://localhost:9200/_analyze

    {

      "text":"中华人民共和国人民大会堂",

      "analyzer":"ik_max_word"

    }

    响应结果:

    {

        "tokens": [

            {

                "token""中华人民共和国",

                "start_offset"0,

                "end_offset"7,

                "type""CN_WORD",

                "position"0

            },

            {

                "token""中华人民",

                "start_offset"0,

                "end_offset"4,

                "type""CN_WORD",

                "position"1

            },

            {

                "token""中华",

                "start_offset"0,

                "end_offset"2,

                "type""CN_WORD",

                "position"2

            },

            {

                "token""华人",

                "start_offset"1,

                "end_offset"3,

                "type""CN_WORD",

                "position"3

            },

            {

                "token""人民共和国",

                "start_offset"2,

                "end_offset"7,

                "type""CN_WORD",

                "position"4

            },

            {

                "token""人民",

                "start_offset"2,

                "end_offset"4,

                "type""CN_WORD",

                "position"5

            },

            {

                "token""共和国",

                "start_offset"4,

                "end_offset"7,

                "type""CN_WORD",

                "position"6

            },

            {

                "token""共和",

                "start_offset"4,

                "end_offset"6,

                "type""CN_WORD",

                "position"7

            },

            {

                "token""国人",

                "start_offset"6,

                "end_offset"8,

                "type""CN_WORD",

                "position"8

            },

            {

                "token""人民大会堂",

                "start_offset"7,

                "end_offset"12,

                "type""CN_WORD",

                "position"9

            },

            {

                "token""人民大会",

                "start_offset"7,

                "end_offset"11,

                "type""CN_WORD",

                "position"10

            },

            {

                "token""人民",

                "start_offset"7,

                "end_offset"9,

                "type""CN_WORD",

                "position"11

            },

            {

                "token""大会堂",

                "start_offset"9,

                "end_offset"12,

                "type""CN_WORD",

                "position"12

            },

            {

                "token""大会",

                "start_offset"9,

                "end_offset"11,

                "type""CN_WORD",

                "position"13

            },

            {

                "token""会堂",

                "start_offset"10,

                "end_offset"12,

                "type""CN_WORD",

                "position"14

            }

        ]

    }

    2.使用standard分词器

    ###  请求

    GET    http://localhost:9200/_analyze

    {

      "text":"中华人民共和国人民大会堂",

      "analyzer":"standard"

    }

    响应结果:

    {

        "tokens": [

            {

                "token""中",

                "start_offset"0,

                "end_offset"1,

                "type""",

                "position"0

            },

            {

                "token""华",

                "start_offset"1,

                "end_offset"2,

                "type""",

                "position"1

            },

            {

                "token""人",

                "start_offset"2,

                "end_offset"3,

                "type""",

                "position"2

            },

            {

                "token""民",

                "start_offset"3,

                "end_offset"4,

                "type""",

                "position"3

            },

            {

                "token""共",

                "start_offset"4,

                "end_offset"5,

                "type""",

                "position"4

            },

            {

                "token""和",

                "start_offset"5,

                "end_offset"6,

                "type""",

                "position"5

            },

            {

                "token""国",

                "start_offset"6,

                "end_offset"7,

                "type""",

                "position"6

            },

            {

                "token""人",

                "start_offset"7,

                "end_offset"8,

                "type""",

                "position"7

            },

            {

                "token""民",

                "start_offset"8,

                "end_offset"9,

                "type""",

                "position"8

            },

            {

                "token""大",

                "start_offset"9,

                "end_offset"10,

                "type""",

                "position"9

            },

            {

                "token""会",

                "start_offset"10,

                "end_offset"11,

                "type""",

                "position"10

            },

            {

                "token""堂",

                "start_offset"11,

                "end_offset"12,

                "type""",

                "position"11

            }

        ]

    }

    3.使用 ik_smart分词

    ###  请求

    GET    http://localhost:9200/_analyze

    {

      "text":"中华人民共和国人民大会堂",

      "analyzer":"ik_smart"

    }

    响应结果:

    {

        "tokens": [

            {

                "token""中华人民共和国",

                "start_offset"0,

                "end_offset"7,

                "type""CN_WORD",

                "position"0

            },

            {

                "token""人民大会堂",

                "start_offset"7,

                "end_offset"12,

                "type""CN_WORD",

                "position"1

            }

        ]

    }

    https://www.jianshu.com/p/e8e6874799f6

    https://www.bilibili.com/read/cv17912145/

    1.1.3 入库和查询指定分词器

    1.创建或者更新文档时,会对文档进行分词,可以指定分词

    创建index mapping时指定search_analyzer

     不指定分词时,会使用默认的standard

    明确字段是否需要分词,不需要分词的字段将type设置为keyword,可以节省空间和提高写性能。

    2.搜索:查询时,对查询语句分词

    查询时通过analyzer指定分词器

    es的分词器analyzer_51CTO博客_es分词器

    1.2 text与keyword类型

    1.2.1 两种类型说明

    ES5.0及以后的版本取消了string类型,将原先的string类型拆分为text和keyword两种类型。

    1.如果字段是text类型,存入的数据会先进行分词,然后将分完词的词组存入索引,但是text类型的数据不能用来过滤、排序和聚合等操作

    2.keyword则不会进行分词,直接存储。常常被用来过滤、排序和聚合

    3.es自动生成的该字段的mapping是text + keyword(es版本7.9.0)。

    1.2.2 字段同时具有keyword和text属性

    当直接保存一个字符串字段时,es自动生成的该字段的mapping是text + keyword(es版本7.9.0)。

    {

        "city_info": {

            "mappings": {

                "properties": {

                    "address": {

                        "type""text",

                        "fields": {

                            "keyword": {

                                "type""keyword",

                                "ignore_above"256

                            }

                        }

                    },

                    "cityName": {

                        "type""text",

                        "fields": {

                            "keyword": {

                                "type""keyword",

                                "ignore_above"256

                            }

                        }

                    }

                }

            }

        }

    }

    1.创建索引

     2.直接添加文档

     3.查看mapping

     1.2.3 给text类型添加keyword属性

     如果在创建index的时候给某个字段指定了类型text,但是之后又想给它追加上keyword方便以后按完整字符串搜索。可以通过PUT命令实现。

    例子如下: 

  • 相关阅读:
    adb shell命令
    爱上开源之golang入门至实战第四章函数(Func)(六)
    前端基础(Vue Router路由的使用)
    流量2----2
    从Mpx资源构建优化看splitChunks代码分割
    大数据必学Java基础(九十五):预编译语句对象
    Vue 中简易封装网络请求(Axios),包含请求拦截器和响应拦截器
    Java项目源码下载S2SH基于java的保险业务管理系统
    docker 构建python Dockerfile
    了解计算机
  • 原文地址:https://blog.csdn.net/u011066470/article/details/130898153