IK中文分词器
Github:https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v8.2.3
unzip elasticsearch-analysis-ik-8.2.3.zip -d /usr/local/elasticsearch-8.2.3/plugins/ik
重启ES
如果启动报错:
解决办法
cd /usr/local/elasticsearch-8.4.0/plugins/ik
vim plugin-descriptor.properties
修改
plugin-descriptor.properties
文件中elasticsearch.version=你的ES版本号
我之前安装的是ES8.4.0但是用上面的方法虽然会解决报错,但是会出现其他的报错。
所以我将ES降到了ES8.2.3版本。和analysis-ik的分词器版本一致。
测试中文分词效果
POST /_analyze
{
"analyzer": "ik_max_word",
"text": "上下班车流量很大"
}
演示:
结果:
{
"tokens": [{
"token": "上下班",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "上下",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
},
{
"token": "下班",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
},
{
"token": "班车",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "车流量",
"start_offset": 3,
"end_offset": 6,
"type": "CN_WORD",
"position": 4
},
{
"token": "车流",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 5
},
{
"token": "流量",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 6
},
{
"token": "很大",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 7
}
]
}