Document): 索引和搜索时的主要数据载体,包含一个或多个现存有数据的字段field): 文档的一部分,包含名称和值两部分term): 一个搜索单元,表示文本中得一个单词token): 表示在字段中出现词得属性,又词得文本、开始和结束偏移量以及类型组成JavaScript Object Notation)是一种轻量级的数据交换格式,不仅易于人们阅读和编写ES把输入文档和复杂的查询语法及输出的查询结果封装为XContent,数据就可以采用XML和JSON格式表示成可读的形式
使用RESTFul API隐藏Lucene的复杂性
Lucene是由一个Java语言开发的开源全文检索引擎工具包。把Lucene用Netty封装成服务,使用JSON访问就是Elasticsearch,底层是Luecen
内置了对分布式集群和分布式索引的管理,所以相对Solr来说,不需要额外安装ZooKeeper,其更容易分布式部署
搜索系统整体架构

master),主节点宕机,会自动选出新的主节点,所以不存在单点故障shard),每一个分片都可以有0到多个副本(replicas),每一个副本==都是分片的完整复制品,这样提高了查询速度Gateway来管理集群恢复,可以配置集群加入多个节点才能启动恢复数据。网关配置用于恢复任何失败的索引,当节点崩溃重新启动时,Elasticsearch将网关读取所有的索引和元数据**Transport:内部节点或者集群客户端之间的交互方式,默认使用TCP协议进行交互,同时支持HTTP协议(JSON格式)、Thrift,Servlet,Memcached,ZeroMQ等多种传输协议索引(Index)
RDBMS的数据库Shards),每个分片可以有多个副本(replica)文档(document)
RDBMS中的一行记录multivalued):文档由多个字段组成,每个字段多次出现在一个文档里文档类型
一个索引对象可以存储很多不同用途的对象
映射
节点和集群
分片
shard,存放在不同节点上,其中每个分片都是一个独立的索引,分片**上,ES会把查询发送给每个相关的分片,并将结果合并在一起副本(replica)
replica是一个分片的精确复制,每个分片有零个或多个副本primary):被动选择更改索引操作,其余成为副本分片(replica shard)Query DSL
时光之门

Elasticsearch使用文档的唯一标识符来计算文档应该被放到哪个分片中
执行搜索请求

RDBMS)对比| Elasticsearch | RDBMS |
|---|---|
| Cluster | Database |
| Shard | Shard |
| Index | table |
| Field | Column |
| Documnet | Row |

从一串文本切分出一个一个的词条,对词条进行标准化
包括三部分
character filter: 分词前的预处理,过滤HTML标签,特俗符号转换内置分词器


URL

7.8.1版本wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.1-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.1-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.8.1-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.8.1-linux-x86_64.tar.gz
cd elasticsearch-7.8.1/ ll
# 设置集群名称,集群内所有节点的名称必须一致。
cluster.name: es
# 设置节点名称,集群内节点名称必须唯一。
node.name: es-node1
# 表示该节点会不会作为主节点,true表示会;false表示不会
node.master: true
# 当前节点是否用于存储数据,是:true、否:false
node.data: true
# 索引数据存放的位置
path.data: /opt/elasticsearch-7.8.1/data
# 日志文件存放的位置
path.logs: /opt/elasticsearch-7.8.1/logs
# 需求锁住物理内存,是:true、否:false
bootstrap.memory_lock: true
# 监听地址,用于访问该es
network.host: node1
# es对外提供的http端口,默认 9200
http.port: 9200
# TCP的默认监听端口,默认 9300
transport.tcp.port: 9300
# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
discovery.zen.minimum_master_nodes: 2
# es7.x 之后新增的配置,写入候选主节点的设备地址,在开启服务后可以被选为主节点
discovery.seed_hosts: ["node1:9300", "node2:9300", "node3:9300"]
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5
# es7.x 之后新增的配置,初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["es-node1", "es-node2", "es-node3"]
# 是否支持跨域,是:true,在使用head插件时需要此配置
http.cors.enabled: true
# “*” 表示支持所有域名
http.cors.allow-origin: "*"
action.destructive_requires_name: true
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history*
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.graph.enabled: false
xpack.watcher.enabled: false
xpack.ml.enabled: false
[2020-08-17T05:53:23,496][WARN ][o.e.c.c.ClusterFormationFailureHelper] [es-node1] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [node1] to bootstrap a cluster: have discovered [{es-node1}{D73-QBgpTp-Q7RgailBDiQ}{o83R1VLyQZWcs8lAG9o1Ug}{node1}{192.168.199.137:9300}{dimrt}{xpack.installed=true, transform.node=true}, {es-node2}{LghG7C8pRoamT1h9GYBtRA}{CskX0v3wTwGvUNNSp3Rg2A}{node2}{192.168.199.138:9300}{dimrt}{xpack.installed=true, transform.node=true}, {es-node3}{BPDEwYozS4OWGkQGED-P2w}{toxtMJHXT1SOSmDUYz_tjg}{node3}{192.168.199.139:9300}{dimrt}{xpack.installed=true, transform.node=true}]; discovery will continue using [192.168.199.138:9300, 192.168.199.139:9300] from hosts providers and [{es-node1}{D73-QBgpTp-Q7RgailBDiQ}{o83R1VLyQZWcs8lAG9o1Ug}{node1}{192.168.199.137:9300}{dimrt}{xpack.installed=true, transform.node=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
cluster.initial_master_nodes为es的node.name,不是服务器名称java.lang.IllegalStateException: transport not ready yet to handle incoming requests
无需处理
[root@es1 ~]# adduser es
为这个用户初始化密码,linux会判断密码复杂度,不过可以强行忽略:
[root@es1 ~]# passwd es
更改用户 es 的密码 。
新的 密码:1q23lyc$%
无效的密码: 密码未通过字典检查 - 过于简单化/系统化
重新输入新的 密码:
passwd:所有的身份验证令牌已经成功更新。
赋予用户权限
在root用户
vi /etc/sudoers
添加 : USERNAME ALL=(ALL) ALL
以下配置可以给sudo权限免密
添加 : USERNAME ALL=(ALL) NOPASSWD:ALL
目录结构

配置
elasticsearch.yml
cluster.name 和 node.namelogging.yml
/etc/security/limits.conf中修改,当前的值可以用 ulimitOutOfMemoryError 异常的条目,把 ES_HEAP_SIZE 变量设置到大于1024修改配置文件
$ vim /opt/elasticsearch/elasticsearch5.6.5/config/elasticsearch.yml
cluster.name: es (集群名称,同一集群要一样)
node.name: es-node1 (节点名称,同一集群要不一样)
http.port: 9200 #连接端口
network.host: node1 #默认网络连接地址,写当前主机的静态IP,这里不能写127.0.0.1
path.data: /opt/elasticsearch/data #数据文件存储路径
path.logs: /opt/elasticsearch/logs #log文件存储路径
discovery.zen.ping.unicast.hosts: ["node1","node2","node3"]#集群中master节点的初始列表,可以通过这些节点来自动发现新加入集群的节点。
bootstrap.memory_lock: true
bootstrap.system_call_filter: false # 因centos6不支持SecComp而默认bootstrap.system_call_filter为true进行检测,所以,要设置为 false。注:SecComp为secure computing mode简写
http.cors.enabled: true #是否支持跨域,默认为false
http.cors.allow-origin: "*" #当设置允许跨域,默认为*,表示支持所有域名
discovery.zen.minimum_master_nodes: 2 #这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
elasticsearch备份位置,路径要手动创建
path.repo: ["/opt/elasticsearch/elasticseaarch-5.6.3/data/backups"]
如果安装elasticsearch-head插件,需要添加以下选项
http.cors.enabled: true
http.cors.allow-origin: "*"
如果安装x-pack插件,我们取消他的basic认证,需要添加以下选项
xpack.security.enabled: false
修改jvm内存[这个配置项很重要,在现实生产中要配的大一些,但是最大不能超过32g]
vim config/jvm.options
-Xms2g ---> -Xms512m
-Xmx2g ---> -Xms512m
每一台设备都要单独启动
前台启动
$ ./elasticsearch
后台启动 -d为守护进程运行
$ ./elasticsearch –d
$ ./elasticsearch & # 使用这种方式他会打印日志在前台
查看进程
$ jps
2369 Elasticsearch
验证是否安装成功

[2020-07-25T00:44:08,878][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-node1] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.1.1.jar:6.1.1]
Caused by: java.lang.RuntimeException: can not run elasticsearch as root
at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:104) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:171) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:322) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-6.1.1.jar:6.1.1]
... 6 more
ERROR: [2] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
解决方法
vim /etc/security/limits.conf
#在最后添加
* soft nofile 65536
* hard nofile 131072
vi /etc/sysctl.conf
vm.max_map_count=655360
sysctl -porg.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to bind to [9300-9400]]; nested: BindException[Cannot assign requested address];
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.1.1.jar:6.1.1]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-6.1.1.jar:6.1.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:85) ~[elasticsearch-6.1.1.jar:6.1.1]
host.name[2020-07-25T02:00:20,423][INFO ][o.e.d.z.ZenDiscovery ] [es-node2] failed to send join request to master [{es-node1}{Jqt6xka3Q6e_HM7HP7eazQ}{h6izGMXsQWWW8bOWfr-3_g}{node1}{192.168.199.137:9300}], reason [RemoteTransportException[[es-node1][192.168.199.137:9300][internal:discovery/zen/join]]; nested: IllegalArgumentException[can't add node {es-node2}{Jqt6xka3Q6e_HM7HP7eazQ}{G5EKFO-WR0yZY5gbIKnDDA}{node2}{192.168.199.138:9300}, found existing node {es-node1}{Jqt6xka3Q6e_HM7HP7eazQ}{h6izGMXsQWWW8bOWfr-3_g}{node1}{192.168.199.137:9300} with the same id but is a different node instance]; ]
原因:是因为复制的elasticsearch文件夹下包含了data文件中示例一的节点数据,需要把示例二data文件下的文件清空
解决方法:删除es集群data数据库文件夹下所有文件即可

安装nodejs
mkdir /opt/nodejs
wget https://nodejs.org/dist/v10.15.2/node-v10.15.2-linux-x64.tar.xz
# 解压
tar -xf node-v10.15.2-linux-x64.tar.xz
# 创建软链接
/opt/nodejs/node-v10.15.2-linux-x64
#配置path环境变量
vi ~/.bash_profile
export NODE_HOME=/opt/nodejs/node-v10.15.2-linux-x64
export PATH=$PATH:$NODE_HOME/bin
source ~/.bash_profile
# 执行 node -v 验证安装
yum install git npm
克隆git项目,切换head目录启动head插件
git clone https://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head/
ls
npm install
# 报错,执行
npm install phantomjs-prebuilt@2.1.16 --ignore-scripts
nohup npm run start &
安装成功

使用技巧
https://blog.csdn.net/bsh_csn/article/details/53908406# 下载链接
https://github.com/NLPchina/elasticsearch-sql/releases/download/5.4.1.0/es-sql-site-standalone.zip
# 切换到解压目录中的site-server
npm install express --save
#site-server/site_configuration.json配置文件中修改启动服务的端口
#重启es,再启动es-sql前端
node node-server.js &
# 更新
npm install -g npm
#docker安装
docker run -d --name elasticsearch-sql -p 9680:8080 -p 9700:9300 -p 9600:9200 851279676/es-sql:6.6.2
#可以选择rpm安装或者源码包安装:
#我这里为了方便快捷直接使用rpm
wget https://github.com/lmenezes/cerebro/releases/download/v0.8.5/cerebro-0.8.5-1.noarch.rpm
#安装:
rpm -ivh cerebro-0.8.5-1.noarch.rpm
#配置
rpm -ql cerebro-0.8.5-1
#可以看到配置文件
/usr/share/cerebro/conf/application.conf
#日志文件:
/var/log/cerebro
#配置:
#可以指定配置参数启动:
bin/cerebro -Dhttp.port=1234 -Dhttp.address=127.0.0.1
#可以指定配置文件启动:
启动:
bin/cerebro -Dconfig.file=/some/other/dir/alternate.conf
配置:
# vim /usr/share/cerebro/conf/application.conf
# A list of known hosts
hosts = [
{
host = "http://192.168.8.102:9200"
name = "ES Cluster"
# headers-whitelist = [ "x-proxy-user", "x-proxy-roles", "X-Forwarded-For" ]
#}
# Example of host with authentication
#{
# host = "http://some-authenticated-host:9200"
# name = "Secured Cluster"
# auth = {
# username = "username"
# password = "secret-password"
# }
}
]
cerebro的启动 状态查看和关闭:
# systemctl stop cerebro
# systemctl start cerebro
# systemctl status cerebro
● cerebro.service - Elasticsearch web admin tool
Loaded: loaded (/usr/lib/systemd/system/cerebro.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-12-12 14:36:39 CST; 6s ago
Process: 11484 ExecStartPre=/bin/chmod 755 /run/cerebro (code=exited, status=0/SUCCESS)
为了便于问题排除可以直接使用命令启动cerebro:
# /usr/bin/cerebro
默认启动的:
[info] play.api.Play - Application started (Prod) (no global state)
[info] p.c.s.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
允许网络范围内的任意主机登陆访问:
登陆:
node1:9000
vi /kibana/config/kibana.yaml
server.host: "node3"
elasticsearch.url: "http://node3:9200"
# 配置防火墙5601端口
# 启动
./kibana -d
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.8.1-linux-x86_64.tar.gz
curl https://artifacts.elastic.co/downloads/kibana/kibana-7.8.1-linux-x86_64.tar.gz.sha512 | shasum -a 512 -c -
tar -xzf kibana-7.8.1-linux-x86_64.tar.gz
cd kibana-7.8.1-linux-x86_64/
常用的分词器有 standard、keyword、whitespace、pattern等。
standard 分词器将字符串分割成单独的字词,删除大部分标点符号。keyword 分词器输出和它接收到的相同的字符串,不做任何分词处理。whitespace 分词器只通过空格俩分割文本。pattern 分词器可以通过正则表达式来分割文本。最常用的一般为 standard 分词器。
更多的分词器详见官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-standard-tokenizer.html
下载中文分词器:https://github.com/medcl/elasticsearch-analysis-ik
解压
进入elasticsearch-analysis-ik-master/,编译源码
mvn clean install -Dmaven.test.skip=true
在es的plugins文件夹下创建目录ik
将编译后的/src/elasticsearch-analysis-ik-master/target/releases/elasticsearch-analysis-ik-7.4.0.zip移到/opt/elasticsearch/plugins/目录下
执行/.elasticsearch能够正常运行就说明成功了
git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
#git checkout tags/{version}
git checkout {v6.1.1}
mvn clean
mvn compile
mvn package
../bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.2/elasticsearch-analysis-ik-6.6.2.zip
docker restart container-id
创建index
curl -XPUT http://localhost:9200/chinese
create a mapping
curl -XPOST http://192.168.199.137:9200/chinese/fulltext/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}'
index some docs
curl -XPOST localhost:9200/chinese/fulltext/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:node1/index/_create/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:node1/index/_create/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:node1/index/_create/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
最细粒度拆分
curl -XPOST node2:9200/chinses/ -H 'Content-Type:application/json' -d'
{
"text": ["中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"],
"tokenizer": "ik_max_word"
}
'
ik_smart: 会做最粗粒度的拆分
qury with highlighting
curl -XPOST node1:9200/chinese/_search -H 'Content-Type:application/json' -d'
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
'
curl -XPUT http://192.168.199.137:9200/ik_test
curl -XPUT http://192.168.199.137:9200/ik_test_1
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/_mapping -H 'Content-Type:application/json'-d'
{
"fulltext": {
"_all": {
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"term_vector": "no",
"store": "false"
},
"properties": {
"content": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true",
"boost": 8
}
}
}
}'
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/2 -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/3 -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
curl -XPOST http://192.168.199.137:9200/ik_test_1/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://192.168.199.137:9200/ik_test_1/fulltext/2 -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://192.168.199.137:9200/ik_test_1/fulltext/3 -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://192.168.199.137:9200/ik_test_1/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
curl -XPOST http://192.168.199.137:9200/ik_test/fulltext/_search?pretty -d'{
"query" : { "match" : { "content" : "洛杉矶领事馆" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}'
7curl -XPOST http://192.168.199.137:9200/ik_test_1/fulltext/_search?pretty -d'{
"query" : { "match" : { "content" : "洛杉矶领事馆" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}'
#报错
#"analyzer [ik_max_word] not found for field [content]"}]
给所有节点安装ik,并重启
参考链接[超详细的Elasticsearch高性能优化实践 - 不言不语技术 - 博客园 (cnblogs.com)](https://www.cnblogs.com/hzcya1995/p/13312071.html)
Swapping 是性能的坟墓
为了使 ES 有更好等性能,强烈建议关闭 Swap
sudo swapoff -a
#可以执行命令刷新一次SWAP(将SWAP里的数据转储回内存,并清空SWAP里的数据)
swapoff -a && swapon -a
sysctl -p (执行这个使其生效,不用重启)

# /etc/sysctl.conf
vm.swappiness = 1 //0-100,则表示越倾向于使用虚拟内存。
#注意:Swappiness 设置为 1 比设置为 0 要好,因为在一些内核版本,Swappness=0 会引发 OOM(内存溢出)
#elasticsearch.yml
bootstrap.mlockall: true
| 角色 | 描述 | 存储 | 内存 | 计算 | 网络 |
|---|---|---|---|---|---|
| 数据节点 | 存储和检索数据 | 极高 | 高 | 高 | 中 |
| 主节点 | 管理集群状态 | 低 | 低 | 低 | 低 |
| Ingest 节点 | 转换输入数据 | 低 | 中 | 高 | 中 |
| 机器学习节点 | 机器学习 | 低 | 极高 | 极高 | 中 |
| 协调节点 | 请求转发和合并检索结果 | 低 | 中 | 中 | 中 |
角色隔离
node.master=true
node.data=false
node.master=false
node.data=true

#参考链接
https://www.elastic.co/cn/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
https://blog.csdn.net/laoyang360/article/details/103545432
Elasticsearch索引;
配置索引结构映射,知道可使用的字段类型;
使用批量索引加快索引过程;
使用附加的内部信息扩展索引结构;
理解、设置及控制段合并;
理解路由的工作原理,并根据需求设置。
elasticsearch.yml)定义的默认值创建索引,索引结束时将得到5个分片及1个副本,10个lucene索引分布在集群中,每一个分片都有自己的分片副本(copy),所以实际上有5个分片和5个相应分片副本不创建索引情况下,创建文档
curl -H "Content-Type: application/json" -XPUT http://node1:9200/blog/article/1 -d '{"title": "New
version of Elasticsearch released!", "content": "...", "tags":
["announce", "elasticsearch", "release"] }'
#索引名为blog,类型为article,自定义id是“1”,新加一个文档
curl -H 'Content-Type:application/json' -XPUT http://localhost:9600/blog/article/1 -d '
{
"id": "1",
"title": "New version of Elasticsearch released!",
"content": "Version 1.0 released today!",
"priority": 10,
"tags": ["announce", "elasticsearch", "release"]
}'
curl -H 'Content-Type:application/json' -XPOST http://localhost:9600/blog -d '{
"from": 0,
"size": 0,
"_source": {
"includes": [
"id",
"title",
"COUNT"
],
"excludes": []
},
"stored_fields": [
"id",
"title"
],
"aggregations": {
"num": {
"value_count": {
"field": "_index"
}
}
}
}'
curl -H 'Content-Type:application/json' -XPUT http://node1:9200/blog/article/2 -d '
{
"id": "2",
"title": "Create Index Test",
"content": "Success",
"priority": 100,
"tags": ["Test", "elasticsearch", "curl"]
}'

创建索引
curl -XPUT http://192.168.199.136:9200/blog/

修改索引的自动创建
#vim elasticsearch.yml
action.auto_create_index: false
新建索引的设定
curl -H "Content-Type: application/json" -XPUT http://node1:9200/test/ -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 2
}
}'

删除索引
curl –XDELETE http://node:9200/posts

通过定义文档的JSON来猜测文档结构
{
"field1": 10, # field1确定为数字,long型
"field2": "10" # field2确定为字符串
}
curl -XPUT http://192.168.199.136:9200/blog/?pretty -d '{
"mappings" : {
"article": {
"numeric_detection" : true
}
}
}'
# 定义可被识别的日期格式
curl -XPUT 'http://192.168.199.136:9200/blog/' -d '{
"mappings" : {
"article" : {
"dynamic_date_formats" : ["yyyy-MM-dd hh:mm"]
}
}
}'
禁用字段类型猜测
curl -XPUT 'http://192.168.199.136:9200/blog/' -d '{
"mappings" : {
"article" : {
"dynamic" : "false",
"properties" : {
"id" : { "type" : "string" },
"content" : { "type" : "string" },
"author" : { "type" : "string" }
}
}
}
}'
模式映射(schema mapping,或简称映射)用于定义索引结构
结构:
唯一标识符;
名称;
发布日期;
内容。
post.json
{
"mappings": {
"post": {
"properties": {
"id": {"type":"long", "store":"yes",
"precision_step":"0" },
"name": {"type":"string", "store":"yes",
"index":"analyzed" },
"published": {"type":"date", "store":"yes",
"precision_step":"0" },
"contents": {"type":"string", "store":"no",
"index":"analyzed" }
}
}
}
}
curl -XPOST 'http://192.168.199.136:9200/posts' -d @posts.json
(1) settings: 指定index的配置信息, 比如分片数、副本数, tranlog同步条件、refresh策略等信息;
(2) mappings: 指定index的内部构建信息, 主要有:
①
_all: All Field字段, 如果开启,_all字段就会把所有字段的内容都包含进来,检索的时候可以不用指定字段查询 —— 会检索多个字段, 设置方式:"_all": {"enabled": true};在ES 6.0开始,
_all字段被禁用了, 作为替换, 可以通过copy_to自定义实现all字段的功能.②
_source: Source Field字段, ES为每个文档都保存一份源数据, 如果不开启, 也就是"_source": {"enabled": false}, 查询的时候就只会返回文档的ID, 其他的文档内容需要通过Fields字段到索引中再次获取, 效率很低. 但若开启, 索引的体积会更大, 此时就可以通过Compress进行压缩, 并通过inclueds、excludes等方式在field上进行限制 —— 指定义允许哪些字段存储到_source中, 哪些不存储;③
properties: 最重要的配置, 是对索引结构和文档字段的设置.
{
"order": 0, // 模板优先级
"template": "sample_info*", // 模板匹配的名称方式
"settings": {...}, // 索引设置
"mappings": {...}, // 索引中各字段的映射定义
"aliases": {...} // 索引的别名
}
索引模板一般用在时间序列相关的索引中.
—— 也就是说, 如果你需要每间隔一定的时间就建立一次索引, 你只需要配置好索引模板, 以后就可以直接使用这个模板中的设置, 不用每次都设置settings和mappings.
索引模板一般与索引别名一起使用. 关于索引别名, 后续研究之后再做补充.
创建一个商品的索引模板的示例:
(1) ES 6.0之前的版本:
PUT _template/shop_template
{
"template": "shop*", // 可以通过"shop*"来适配
"order": 0, // 模板的权重, 多个模板的时候优先匹配用, 值越大, 权重越高
"settings": {
"number_of_shards": 1 // 分片数量, 可以定义其他配置项
},
"aliases": {
"alias_1": {} // 索引对应的别名
},
"mappings": {
"_default": { // 默认的配置, ES 6.0开始不再支持
"_source": { "enabled": false }, // 是否保存字段的原始值
"_all": { "enabled": false }, // 禁用_all字段
"dynamic": "strict" // 只用定义的字段, 关闭默认的自动类型推断
},
"type1": { // 默认的文档类型设置为type1, ES 6.0开始只支持一种type, 所以这里不需要指出
*/
"_source": {"enabled": false},
"properties": { // 字段的映射
"@timestamp": { // 具体的字段映射
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"@version": {
"doc_values": true,
"index": "not_analyzed", // 不索引
"type": "string" // string类型
},
"logLevel": {
"type": "long"
}
}
}
}
}
(2) ES 6.0之后的版本:
PUT _template/shop_template
{
"index_patterns": ["shop*", "bar*"], // 可以通过"shop*"和"bar*"来适配, template字段已过期
"order": 0, // 模板的权重, 多个模板的时候优先匹配用, 值越大, 权重越高
"settings": {
"number_of_shards": 1 // 分片数量, 可以定义其他配置项
},
"aliases": {
"alias_1": {} // 索引对应的别名
},
"mappings": {
// ES 6.0开始只支持一种type, 名称为“_doc”
"_doc": {
"_source": { // 是否保存字段的原始值
"enabled": false
},
"properties": { // 字段的映射
"@timestamp": { // 具体的字段映射
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"@version": {
"doc_values": true,
"index": "false", // 设置为false, 不索引
"type": "text" // text类型
},
"logLevel": {
"type": "long"
}
}
}
}
}
curl -XPUT http://114.115.200.44:9600/_template/alarm -H "Content-Type: application/json" -d '
{
"order": 0,
"template": "alarm*",
"settings": {
"index": {
"max_result_window": "10100",
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {
"_doc": {
"_source": {},
"_all": {
"enabled": false
},
"properties": {
"alarm_name": {
"type": "text",
"fielddata": true
},
"alarm_type": {
"type": "text",
"fielddata": true
},
"id": {
"type": "text",
"fielddata": true
},
"alarm_content": {
"type": "text",
"fielddata": true
}
}
}
},
"aliases": {}
}'
# keyword
curl -XPUT http://114.115.200.44:9600/_template/log -H "Content-Type: application/json" -d '
{
"order": 0,
"template": "log*",
"settings": {
"index": {
"max_result_window": "10100",
"number_of_shards": "1",
"number_of_replicas": "0"
}
},
"mappings": {
"_doc": {
"_source": {},
"_all": {
"enabled": false
},
"properties": {
"log_name": {
"type": "keyword",
"fielddata": true
},
"log_type": {
"type": "keyword",
"fielddata": true
},
"id": {
"type": "keyword",
"fielddata": true
},
"log_content": {
"type": "keyword",
"fielddata": true
}
}
}
},
"aliases": {}
}'
put /test
{
"settings":{
"number_of_shards":3,
"number_of_replicas":2
},
"mappings":{
"properties":{
"id":{"type":"long"},
"name":{"type":"text","analyzer":"ik_smart"},
"text":{"type":"text","analyzer":"ik_max_word"}
}
}
}
说明:
直接修改mapping的优先级 > 索引模板中的设置;
索引匹配了多个template, 当属性等配置出现不一致时, 以模板的权重(order属性的值)为准, 值越大越优先, order的默认值是0.
ES 6.0之后的版本API变化较大, 请重点关注.
curl -XPUT http://114.115.200.44:9600/alarm/_doc/1 -H "Content-Type: application/json" -d '
{
"id": "6",
"alarm_name": "sql injection",
"alarm_type": "exploit"
}'
(1) 查看示例:
GET _template // 查看所有模板
GET _template/temp* // 查看与通配符相匹配的模板
GET _template/temp1,temp2 // 查看多个模板
GET _template/shop_template // 查看指定模板
(2) 判断模板是否存在:
判断示例:
HEAD _template/shop_tem
结果说明:
a) 如果存在, 响应结果是:
200 - OK
b) 如果不存在, 响应结果是:404 - Not Found
删除示例:
DELETE _template/shop_template // 删除上述创建的模板
如果模板不存在, 将抛出如下错误:
{
"error" : {
"root_cause" : [
{
"type" : "index_template_missing_exception",
"reason" : "index_template [shop_temp] missing"
}
],
"type" : "index_template_missing_exception",
"reason" : "index_template [shop_temp] missing"
},
"status" : 404
}
如果我们只关心精确匹配, 就设置test_field: {"type": "keyword"}.
—— keyword类型要比text类型的性能更高, 并且还能节省磁盘的存储空间.
提高查询效率
实现方式

分词

记录频率,对搜索结果进行排行

记录位置信息

curl -X POST "http://192.168.16.65:9211/blog/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_phrase": {
"title": "小明今晚真的不加班"
}
}
}
'
curl -X POST "http://192.168.16.65:9211/blog/_delete_by_query" -H 'Content-Type: application/json' -d'
{
"query":{
"match":{
"title":"小明今晚真的不加班"
}
}
}
'
curl -X POST "http://node1:9200/_search?format=json&pretty" -H 'Content-Type: application/json' -d'
{
"query":{
"match":{
"message": "test"
}
}
}
'
多字段查询
{
"dis_max": {
"queries": [
{
"match": {
"title": {
"query": "Quick brown fox",
"minimum_should_match": "30%"
}
}
},
{
"match": {
"body": {
"query": "Quick brown fox",
"minimum_should_match": "30%"
}
}
},
],
"tie_breaker": 0.3
}
}
{
"multi_match": {
"query": "Quick brown fox",
"type": "best_fields",
"fields": [ "title", "body" ],
"tie_breaker": 0.3,
"minimum_should_match": "30%"
}
}
curl -X POST "localhost:9200/suricata/_delete_by_query?pretty" -H 'Content-Type:application/json' -d '{
"query": {
"range": {
"timestamp": {
"gte": "now-5d",
"lte": "now-1d",
"format": "epoch_millis"
}
}
}
}'
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices.htmlcurl
curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>' -d '<BODY>'
GET 获取资源POST 新建资源(也可以用于更新资源)PUT 更新资源HEAD DELETE 删除资源PROTOCOL : http或者https协议HOST :主机名PORT: 默认:9200QUERY_STRING: 可选的查询请求参数,例如pretty参数将使请求返回更加美观易读的JSON数据BODY: 一个JSON格式的请求主体curl -i 显示响应的头信息
curl -v 显示http请求的通信过程
计算集群中的文档数量
curl -H "Content-Type: application/json" -XGET 'http://node1:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}'
{
"count" : 0,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}

查看节点
curl 'http://node1:9200/_cat/nodes'

查看分片
curl 'http://node1:9200/_cat/shards'

查看所有索引
curl 'http://node1:9200/_cat/indices'

查看索引信息
curl -XGET 'http://node1:9200/blog/_search'

turn on verbose output(显示输出内容)#显示主节点
GET /_cat/master?v
curl -XGET 'http://node1:9200/_cat/master?v'

GET /_cat/master?help
curl -XGET 'http://node1:9200/_cat/master?help'
#output
id | | node id
host | h | host name
ip | | ip address
node | n | node name

curl -XGET 'http://node1:9200/_cat/nodes?help'
id | id,nodeId | unique node id
pid | p | process id
ip | i | ip address
port | po | bound transport port
http_address | http | bound http address
version | v | es version
build | b | es build hash
jdk | j | jdk version
disk.total | dt,diskTotal | total disk space
disk.used | du,diskUsed | used disk space
disk.avail | d,da,disk,diskAvail | available disk space
disk.used_percent | dup,diskUsedPercent | used disk space percentage
heap.current | hc,heapCurrent | used heap
heap.percent | hp,heapPercent | used heap ratio
heap.max | hm,heapMax | max configured heap
ram.current | rc,ramCurrent | used machine memory
ram.percent | rp,ramPercent | used machine memory ratio
ram.max | rm,ramMax | total machine memory
file_desc.current | fdc,fileDescriptorCurrent | used file descriptors
file_desc.percent | fdp,fileDescriptorPercent | used file descriptor ratio
file_desc.max | fdm,fileDescriptorMax | max file descriptors
cpu | cpu | recent cpu usage
load_1m | l | 1m load avg
load_5m | l | 5m load avg
load_15m | l | 15m load avg
uptime | u | node uptime
node.role | r,role,nodeRole | m:master eligible node, d:data node, i:ingest node, -:coordinating node only
master | m | *:current master
name | n | node name
completion.size | cs,completionSize | size of completion
fielddata.memory_size | fm,fielddataMemory | used fielddata cache
fielddata.evictions | fe,fielddataEvictions | fielddata evictions
query_cache.memory_size | qcm,queryCacheMemory | used query cache
query_cache.evictions | qce,queryCacheEvictions | query cache evictions
request_cache.memory_size | rcm,requestCacheMemory | used request cache
request_cache.evictions | rce,requestCacheEvictions | request cache evictions
request_cache.hit_count | rchc,requestCacheHitCount | request cache hit counts
request_cache.miss_count | rcmc,requestCacheMissCount | request cache miss counts
flush.total | ft,flushTotal | number of flushes
flush.total_time | ftt,flushTotalTime | time spent in flush
get.current | gc,getCurrent | number of current get ops
get.time | gti,getTime | time spent in get
get.total | gto,getTotal | number of get ops
get.exists_time | geti,getExistsTime | time spent in successful gets
get.exists_total | geto,getExistsTotal | number of successful gets
get.missing_time | gmti,getMissingTime | time spent in failed gets
get.missing_total | gmto,getMissingTotal | number of failed gets
indexing.delete_current | idc,indexingDeleteCurrent | number of current deletions
indexing.delete_time | idti,indexingDeleteTime | time spent in deletions
indexing.delete_total | idto,indexingDeleteTotal | number of delete ops
indexing.index_current | iic,indexingIndexCurrent | number of current indexing ops
indexing.index_time | iiti,indexingIndexTime | time spent in indexing
indexing.index_total | iito,indexingIndexTotal | number of indexing ops
indexing.index_failed | iif,indexingIndexFailed | number of failed indexing ops
merges.current | mc,mergesCurrent | number of current merges
merges.current_docs | mcd,mergesCurrentDocs | number of current merging docs
merges.current_size | mcs,mergesCurrentSize | size of current merges
merges.total | mt,mergesTotal | number of completed merge ops
merges.total_docs | mtd,mergesTotalDocs | docs merged
merges.total_size | mts,mergesTotalSize | size merged
merges.total_time | mtt,mergesTotalTime | time spent in merges
refresh.total | rto,refreshTotal | total refreshes
refresh.time | rti,refreshTime | time spent in refreshes
refresh.listeners | rli,refreshListeners | number of pending refresh listeners
script.compilations | scrcc,scriptCompilations | script compilations
script.cache_evictions | scrce,scriptCacheEvictions | script cache evictions
search.fetch_current | sfc,searchFetchCurrent | current fetch phase ops
search.fetch_time | sfti,searchFetchTime | time spent in fetch phase
search.fetch_total | sfto,searchFetchTotal | total fetch ops
search.open_contexts | so,searchOpenContexts | open search contexts
search.query_current | sqc,searchQueryCurrent | current query phase ops
search.query_time | sqti,searchQueryTime | time spent in query phase
search.query_total | sqto,searchQueryTotal | total query phase ops
search.scroll_current | scc,searchScrollCurrent | open scroll contexts
search.scroll_time | scti,searchScrollTime | time scroll contexts held open
search.scroll_total | scto,searchScrollTotal | completed scroll contexts
segments.count | sc,segmentsCount | number of segments
segments.memory | sm,segmentsMemory | memory used by segments
segments.index_writer_memory | siwm,segmentsIndexWriterMemory | memory used by index writer
segments.version_map_memory | svmm,segmentsVersionMapMemory | memory used by version map
segments.fixed_bitset_memory | sfbm,fixedBitsetMemory | memory used by fixed bit sets for nested object field types and type filters for types referred in _parent fields
suggest.current | suc,suggestCurrent | number of current suggest ops
suggest.time | suti,suggestTime | time spend in suggest
suggest.total | suto,suggestTotal | number of suggest ops
headersGET /_cat/nodes?h=ip,port,heapPercent,name
curl -XGET 'http://node1:9200/_cat/nodes?h=ip,port,heapPercent,name'

b: bytess: sort
v: verboseGET /_cat/indices?bytes=b&s=store.size:desc&v
curl -XGET 'http://node1:9200/_cat/indices?bytes=b&s=store.size:desc&v'
#output
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open blog pMV25OdfSxmRTO9lUWv9Gw 5 1 2 0 25122 12561
green open cq OqLBJsIuT8aXmVTYBe-Ygg 5 1 0 0 1864 699
curl 'node1:9200/_cat/indices?format=json&pretty'
curl 'node1:9200/_cat/indices?pretty' -H "Accept: application/json"
curl 'node1:9200/_cat/indices?pretty'

yaml
[es@node1 root]$ curl 'node1:9200/_cat/indices?pretty' -H "Accept: application/yaml"
---
- health: "green"
status: "open"
index: "cq"
uuid: "OqLBJsIuT8aXmVTYBe-Ygg"
pri: "5"
rep: "1"
docs.count: "0"
docs.deleted: "0"
store.size: "2.2kb"
pri.store.size: "1.1kb"
- health: "green"
status: "open"
index: "blog"
uuid: "pMV25OdfSxmRTO9lUWv9Gw"
pri: "5"
rep: "1"
docs.count: "2"
docs.deleted: "0"
store.size: "24.5kb"
pri.store.size: "12.2kb"
smile
curl 'node1:9200/_cat/indices?pretty' -H "Accept: application/smile"
:)
▒▒healthDgreen▒statusCopen▒indexAcq▒uuidUOqLBJsIuT8aXmVTYBe-Ygg▒pri@5▒rep@1▒docs.count@0▒docs.deleted@0▒store.sizeD2.2kb▒pri.store.sizeD1.1kb▒▒@DgreenACopenBCblogCUpMV25OdfSxmRTO9lUWv9GwD@5E@1F@2G@0HE24.5kbIE12.2kb▒▒[es@node1 root]$ xterm-256color
cbor
[es@node1 root]$ curl 'node1:9200/_cat/indices?pretty' -H "Accept: application/cbor"
▒▒fhealthegreenfstatusdopeneindexbcqduuidvOqLBJsIuT8aXmVTYBe-Yggcpria5crepa1jdocs.counta0ldocs.deleteda0jstore.sizee2.2kbnpri.store.sizee1.1kb▒▒fhealthegreenfstatusdopeneindexdblogduuidvpMV25OdfSxmRTO9lUWv9Gwcpria5crepa1jdocs.counta2ldocs.deleteda0jstore.sizef24.5kbnpri.store.sizef12.2kb▒▒[es@node1 root]$
GET _cat/templates?v&s=order:desc,index_patterns
curl 'node1:9200/_cat/blog?v&s=order:desc,index_patterns'
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
PUT /twitter
curl -XPUT 'http://node1:9200/twitter
path parameter:路径参数
Lowercase only: 只允许小写Cannot include `, /, *, ?, ", <, >, |, (space character), ,, #: 不能包含以上特殊字符:), but that’s been deprecated and won’t be supported in 7.0+-, _, +. or ... are deprecated, except for hidden indices and internal indices managed by pluginsquery parameters:结构参数
include_type_name:布尔类型,如果为true,映射主体中应有一个映射类型
wait_for_active_shards:继续操作之前必须处于活动状态的分片副本数
PUT /test?wait_for_active_shards=2
PUT /test
{
"settings": {
"index.write.wait_for_active_shards": "2"
}
}
master_timeout: 超时时间,默认为30s
timeout:等待响应的确切时间段
query body: 结构体
aliases: 索引别名
PUT /test
{
"aliases": {
"alias_1": {},
"alias_2": {
"filter": {
"term": { "user": "kimchy" }
},
"routing": "kimchy"
}
}
}
mappings: 映射
PUT /test
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
settings: 索引配置选项
curl -H 'Content-Type:application/json' -XPUT http://node1:9200/twitter -d '
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}'
curl -H 'Content-Type:application/json' -XPUT http://node1:9200/blog/article/2 -d '
{
"id": "2",
"title": "Create Index Test",
"content": "Success",
"priority": 100,
"tags": ["Test", "elasticsearch", "curl"]
}'
curl -H 'Content-Type:application/json' -XPUT http://node1:9200/twitter/article/1 -d '
{
"id": "1",
"title": "Create Index Test",
"content": "Success",
"priority": 100,
"tags": ["Test", "elasticsearch", "curl"]
}'
curl -XDELETE 'http://node1:9200/twitter'
qurey parameters
allow_no_indices: 如果通配符表达式或_all值仅检索丢失或闭合的索引,则请求不会返回错误
expand_wildcards:扩展通配符
all:展开以打开和关闭索引,包括隐藏索引open: 仅展开打开索引colse : 仅展开关闭索引hidden: 通配符的扩展将包括隐藏索引。必须与打开,关闭或两者结合使用none: 不接受通配符表达式GET /<index>
curl node2:9200/_all/_search?pretty
head /twitter
POST /twitter/_close
[es@node2 bin]$ curl -XPOST 'http://node1:9200/blog/_close'
{"acknowledged":true}
#关闭所有索引
curl -XPOST node2:9200/_all/_close
{"acknowledged":true}
POST /twitter/_open
# 打开
curl -XPOST 'http://node2:9200/blog/_open'
{"acknowledged":true,"shards_acknowledged":true}
# 打开所有索引
curl -XPOST 'http://node2:9200/_all/_open'
POST /twitter/_shrink/shrunk-twitter-index
curl -H 'Content-Type:application/json' -XPUT node2:9200/blog/_settings -d '
{
"settings": {
"index.number_of_replicas": 0,
"index.routing.allocation.require._name": "shrink_blog_index",
"index.blocks.write": true
}
}'
# If no filters are given, the default is to select all nodes
GET /_nodes
# Explicitly select all nodes
GET /_nodes/_all
# Select just the local node
GET /_nodes/_local
# Select the elected master node
GET /_nodes/_master
# Select nodes by name, which can include wildcards
GET /_nodes/node_name_goes_here
GET /_nodes/node_name_goes_*
# Select nodes by address, which can include wildcards
GET /_nodes/10.0.0.3,10.0.0.4
GET /_nodes/10.0.0.*
# Select nodes by role
GET /_nodes/_all,master:false
GET /_nodes/data:true,ingest:true
GET /_nodes/coordinating_only:true
GET /_nodes/master:true,voting_only:false
# Select nodes by custom attribute (e.g. with something like `node.attr.rack: 2` in the configuration file)
GET /_nodes/rack:2
GET /_nodes/ra*:2
GET /_nodes/ra*:2*
GET /_cluster/state/<metrics>/<index>
(Optional, string) A comma-separated list of the following options:
_all
Shows all metrics.
blocks
Shows the blocks part of the response.
[root@node1 ~]# curl node2:9200/_cluster/state/blocks?pretty
{
"cluster_name" : "es",
"compressed_size_in_bytes" : 1691,
"blocks" : {
"indices" : {
"blog" : {
"8" : {
"description" : "index write (api)",
"retryable" : false,
"levels" : [
"write"
]
}
}
}
}
}
master_node
Shows the elected master_node part of the response.
[root@node1 ~]# curl node2:9200/_cluster/state/master_node?pretty
{
"cluster_name" : "es",
"compressed_size_in_bytes" : 1691,
"master_node" : "Jqt6xka3Q6e_HM7HP7eazQ"
}
metadata
Shows the metadata part of the response. If you supply a comma separated list of indices, the returned output will only contain metadata for these indices.
nodes
Shows the nodes part of the response.
[root@node1 ~]# curl node2:9200/_cluster/state/nodes?pretty
{
"cluster_name" : "es",
"compressed_size_in_bytes" : 1691,
"nodes" : {
"4hD7E_O7RyeaqjwUNBuxDQ" : {
"name" : "es-node3",
"ephemeral_id" : "5zuQOwjhQ1iD-yUVnNmOGQ",
"transport_address" : "192.168.199.139:9300",
"attributes" : { }
},
"zMljaDu_Te-x5PIhFA-qEg" : {
"name" : "es-node2",
"ephemeral_id" : "jIiW4u0JTHuc2iEKTQ3hvw",
"transport_address" : "192.168.199.138:9300",
"attributes" : { }
},
"Jqt6xka3Q6e_HM7HP7eazQ" : {
"name" : "es-node1",
"ephemeral_id" : "0wcDvzymSRefsPfwBuo6LQ",
"transport_address" : "192.168.199.137:9300",
"attributes" : { }
}
}
}
routing_nodes
Shows the routing_nodes part of the response.
routing_table
Shows the routing_table part of the response. If you supply a comma separated list of indices, the returned output will only contain the routing table for these indices.
version
Shows the cluster state version.
curl node2:9200/_cluster/state/version?pretty
{
"cluster_name" : "es",
"compressed_size_in_bytes" : 1691,
"version" : 70,
"state_uuid" : "6OFUpDwAQamUugC6UDoOtw"
}
<index>
(Optional, string) Comma-separated list or wildcard expression of index names used to limit the request.
GET /_cluster/state/metadata,routing_table/foo,bar
GET /_cluster/state/_all/foo,bar
PUT /_cluster/settings
examples
# presistent update
PUT /_cluster/settings
{
"persistent" : {
"indices.recovery.max_bytes_per_sec" : "50mb"
}
}
# transient update
PUT /_cluster/settings?flat_settings=true
{
"transient" : {
"indices.recovery.max_bytes_per_sec" : "20mb"
}
}
# response
{
...
"persistent" : { },
"transient" : {
"indices.recovery.max_bytes_per_sec" : "20mb"
}
}
# reset a setting
PUT /_cluster/settings
{
"transient" : {
"indices.recovery.max_bytes_per_sec" : null
}
}
# response
{
...
"persistent" : {},
"transient" : {}
}
### dynamic indices.recovery settings
PUT /_cluster/settings
{
"transient" : {
"indices.recovery.*" : null
}
}
设置85%写保护
curl -Ss -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient": {"cluster.routing.allocation.disk.watermark.low": "88%"}}'
GET /_cluster/pending_tasks?pretty

取消执行的任务
curl -XPOST localhost:9200/_tasks/ID/_cancel
GET /_tasks/<task_id>
GET /_tasks
GET _tasks
GET _tasks?nodes=nodeId1,nodeId2
GET _tasks?nodes=nodeId1,nodeId2&actions=cluster:*

查询任务详情
curl -X GET 'localhost:9200/_tasks?pretty&detailed&actions=*reindex,*byquery'

curl -XPOST localhost:9200/_tasks/ID/_cancel
GET /_nodes/stats
GET /_nodes/<node_id>/stats
GET/_nodes/stats/<metric>
GET/_nodes/<node_id>/stats/<metric>
GET /_nodes/stats/<metric>/<index_metric>
GET /_nodes/<node_id>/stats/<metric>/<index_metric>
<mertric>
adaptive_selection
Statistics about adaptive replica selection.
breaker
Statistics about the field data circuit breaker.
discovery
Statistics about the discovery.
fs
File system information, data path, free disk space, read/write stats.
http
HTTP connection information.
indices
Indices stats about size, document count, indexing and deletion times, search times, field cache size, merges and flushes.
ingest
Statistics about ingest preprocessing.
jvm
JVM stats, memory pool information, garbage collection, buffer pools, number of loaded/unloaded classes.

os
Operating system stats, load average, mem, swap.

process
Process statistics, memory consumption, cpu usage, open file descriptors.
thread_pool
Statistics about each thread pool, including current size, queue and rejected tasks.
transport
Transport statistics about sent and received bytes in cluster communication.
<index_metric>
completion
docs

fielddata
flush
get
indexing
merge
query_cache
recovery
refresh
request_cache
search
curl 10.218.80.41:9200/_nodes/stats/indices/search?pretty

segments
store
translog
warmer
Routing
curl -H 'Content-Type:application/json' -XPOST node2:9200/blog/_doc?routing=Test -d'
{
"title": "Create Index Test",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
curl -H 'Content-Type:application/json' -XPOST node2:9200/blog/_search?routing=kimchy -d '
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "some query string here"
}
},
"filter": {
"term": { "user": "kimchy" }
}
}
}
}'
docs:响应中的 docs 节点显示索引文档的信息
"docs" : {
"count" : 4, #文档的数目
"deleted" : 0
}
store:关于存储的信息
"store" : {
"size_in_bytes" : 6003,
"throttle_time_in_millis" : 0
}
indexing、get和search:索引删除操作、实时的 get 和搜索
"indexing" : {
"index_total" : 1,
"index_time_in_millis" : 32,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 12,
"query_time_in_millis" : 39,
"query_current" : 0,
"fetch_total" : 2,
"fetch_time_in_millis" : 12,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
额外信息
merges :该节点包含Lucene段合并的信息。
refresh :该节点包含刷新操作的信息。
flush :该节点包含清理信息。
warmer :该节点包含预热器的信息,以及它们执行了多久。
filter_cache :这些是过滤器缓存统计信息。
id_cache :这些是标识符缓存统计信息。
fielddata :这些是字段数据缓存统计信息。
percolate :该节点包含预匹配器使用情况的信息。
completion :该节点包含自动完成建议器的信息。
segments :该节点包含Lucene段的信息。
translog :该节点包含事务日志计数和大小的信息。
curl node1:9200/blog/_stats?pretty
{
"_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
},
"_all" : {
"primaries" : {
"docs" : {
"count" : 1,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 7009
},
"indexing" : {
"index_total" : 1,
"index_time_in_millis" : 32,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 12,
"query_time_in_millis" : 39,
"query_current" : 0,
"fetch_total" : 2,
"fetch_time_in_millis" : 12,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 0,
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size_in_bytes" : 0,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 104857600
},
"refresh" : {
"total" : 19,
"total_time_in_millis" : 75,
"listeners" : 0
},
"flush" : {
"total" : 1,
"total_time_in_millis" : 9
},
"warmer" : {
"current" : 0,
"total" : 8,
"total_time_in_millis" : 6
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 1,
"memory_in_bytes" : 2863,
"terms_memory_in_bytes" : 2089,
"stored_fields_memory_in_bytes" : 312,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 256,
"points_memory_in_bytes" : 2,
"doc_values_memory_in_bytes" : 204,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
},
"translog" : {
"operations" : 1,
"size_in_bytes" : 488,
"uncommitted_operations" : 0,
"uncommitted_size_in_bytes" : 215
},
"request_cache" : {
"memory_size_in_bytes" : 703,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 2
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
},
"total" : {
"docs" : {
"count" : 2,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 14018
},
"indexing" : {
"index_total" : 2,
"index_time_in_millis" : 70,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 25,
"query_time_in_millis" : 76,
"query_current" : 0,
"fetch_total" : 4,
"fetch_time_in_millis" : 19,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 0,
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size_in_bytes" : 0,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 209715200
},
"refresh" : {
"total" : 38,
"total_time_in_millis" : 158,
"listeners" : 0
},
"flush" : {
"total" : 2,
"total_time_in_millis" : 22
},
"warmer" : {
"current" : 0,
"total" : 16,
"total_time_in_millis" : 6
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 2,
"memory_in_bytes" : 5726,
"terms_memory_in_bytes" : 4178,
"stored_fields_memory_in_bytes" : 624,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 512,
"points_memory_in_bytes" : 4,
"doc_values_memory_in_bytes" : 408,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
},
"translog" : {
"operations" : 2,
"size_in_bytes" : 976,
"uncommitted_operations" : 0,
"uncommitted_size_in_bytes" : 430
},
"request_cache" : {
"memory_size_in_bytes" : 2812,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 5
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
}
},
"indices" : {
"blog" : {
"primaries" : {
"docs" : {
"count" : 1,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 7009
},
"indexing" : {
"index_total" : 1,
"index_time_in_millis" : 32,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 12,
"query_time_in_millis" : 39,
"query_current" : 0,
"fetch_total" : 2,
"fetch_time_in_millis" : 12,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 0,
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size_in_bytes" : 0,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 104857600
},
"refresh" : {
"total" : 19,
"total_time_in_millis" : 75,
"listeners" : 0
},
"flush" : {
"total" : 1,
"total_time_in_millis" : 9
},
"warmer" : {
"current" : 0,
"total" : 8,
"total_time_in_millis" : 6
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 1,
"memory_in_bytes" : 2863,
"terms_memory_in_bytes" : 2089,
"stored_fields_memory_in_bytes" : 312,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 256,
"points_memory_in_bytes" : 2,
"doc_values_memory_in_bytes" : 204,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
},
"translog" : {
"operations" : 1,
"size_in_bytes" : 488,
"uncommitted_operations" : 0,
"uncommitted_size_in_bytes" : 215
},
"request_cache" : {
"memory_size_in_bytes" : 703,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 2
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
},
"total" : {
"docs" : {
"count" : 2,
"deleted" : 0
},
"store" : {
"size_in_bytes" : 14018
},
"indexing" : {
"index_total" : 2,
"index_time_in_millis" : 70,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 0,
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 25,
"query_time_in_millis" : 76,
"query_current" : 0,
"fetch_total" : 4,
"fetch_time_in_millis" : 19,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 0,
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size_in_bytes" : 0,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 209715200
},
"refresh" : {
"total" : 38,
"total_time_in_millis" : 158,
"listeners" : 0
},
"flush" : {
"total" : 2,
"total_time_in_millis" : 22
},
"warmer" : {
"current" : 0,
"total" : 16,
"total_time_in_millis" : 6
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 2,
"memory_in_bytes" : 5726,
"terms_memory_in_bytes" : 4178,
"stored_fields_memory_in_bytes" : 624,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 512,
"points_memory_in_bytes" : 4,
"doc_values_memory_in_bytes" : 408,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : -1,
"file_sizes" : { }
},
"translog" : {
"operations" : 2,
"size_in_bytes" : 976,
"uncommitted_operations" : 0,
"uncommitted_size_in_bytes" : 430
},
"request_cache" : {
"memory_size_in_bytes" : 2812,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 5
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
}
}
}
}
集群状态status
green:所有分片已妥善分配yellow:主片已分配,但部分或所有副本都还未分配red:至少一个主片未分配,集群未准备好,查询会返回错误或不完整的结果library 和 map 索引的健康
curl '192.168.199.136:9200/_cluster/health/library,map/?pretty'
{
"cluster_name" : "es",
"status" : "red",
"timed_out" : true,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
level: 把它指定为 cluster (默认)、 indices 或shards ,这样就能够控制由健康度API返回信息的细节timeout: 控制命令执行的最长时间,默认30swait_for_status: 它设置为 green 、 yellow 和 red 。例如,设置为 green 时,健康度API调用将返回绿色状态,或者达wait_for_nodes: 参数允许设置返回响应时需要多少节点可用(或者达到 timeout 时间)。可以设置该参数为整数值,比如3,或者一个简单等式,比如>=3(大于或等于3个节点),<=3(小于或等于3个节点)wait_for_relocating_shard: 默认不指定。它告诉Elasticsearch应该重定位多少分片(或者等待 timeout 时间)。设置该参数为 0 意味着Elasticsearch应该等待所有重定位分片。curl '192.168.199.136:9200/_cluster/health?pretty&wait_for_status=green&wait_for_nodes=>=3&timeout=10s'
{
"cluster_name" : "es",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
可以搜索的数据
populate.sh#!/usr/bin/env bash
ADDRESS=$1
if [ -z $ADDRESS ]; then
ADDRESS="localhost:9200"
fi
# Check that Elasticsearch is running
curl -s "http://$ADDRESS" 2>&1 > /dev/null
if [ $? != 0 ]; then
echo "Unable to contact Elasticsearch at $ADDRESS"
echo "Please ensure Elasticsearch is running and can be reached at http://$ADDRESS/"
exit -1
fi
echo "WARNING, this script will delete the 'get-together' and the 'myindex' indices and re-index all data!"
echo "Press Control-C to cancel this operation."
echo
echo "Press [Enter] to continue."
read
# Delete the old index, swallow failures if it doesn't exist
curl -s -XDELETE "$ADDRESS/get-together" > /dev/null
# Create the next index using mapping.json
echo "Creating 'get-together' index..."
curl -s -XPUT -H'Content-Type: application/json' "$ADDRESS/get-together" -d@$(dirname $0)/mapping.json
# Wait for index to become yellow
curl -s "$ADDRESS/get-together/_health?wait_for_status=yellow&timeout=10s" > /dev/null
echo
echo "Done creating 'get-together' index."
echo
echo "Indexing data..."
echo "Indexing groups..."
curl -s -XPOST "$ADDRESS/get-together/_doc/1" -H'Content-Type: application/json' -d'{
"relationship_type": "group",
"name": "Denver Clojure",
"organizer": ["Daniel", "Lee"],
"description": "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
"created_on": "2012-06-15",
"tags": ["clojure", "denver", "functional programming", "jvm", "java"],
"members": ["Lee", "Daniel", "Mike"],
"location_group": "Denver, Colorado, USA"
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/2" -H'Content-Type: application/json' -d'{
"relationship_type": "group",
"name": "Elasticsearch Denver",
"organizer": "Lee",
"description": "Get together to learn more about using Elasticsearch, the applications and neat things you can do with ES!",
"created_on": "2013-03-15",
"tags": ["denver", "elasticsearch", "big data", "lucene", "solr"],
"members": ["Lee", "Mike"],
"location_group": "Denver, Colorado, USA"
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/3" -H'Content-Type: application/json' -d'{
"relationship_type": "group",
"name": "Elasticsearch San Francisco",
"organizer": "Mik",
"description": "Elasticsearch group for ES users of all knowledge levels",
"created_on": "2012-08-07",
"tags": ["elasticsearch", "big data", "lucene", "open source"],
"members": ["Lee", "Igor"],
"location_group": "San Francisco, California, USA"
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/4" -H'Content-Type: application/json' -d'{
"relationship_type": "group",
"name": "Boulder/Denver big data get-together",
"organizer": "Andy",
"description": "Come learn and share your experience with nosql & big data technologies, no experience required",
"created_on": "2010-04-02",
"tags": ["big data", "data visualization", "open source", "cloud computing", "hadoop"],
"members": ["Greg", "Bill"],
"location_group": "Boulder, Colorado, USA"
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/5" -H'Content-Type: application/json' -d'{
"relationship_type": "group",
"name": "Enterprise search London get-together",
"organizer": "Tyler",
"description": "Enterprise search get-togethers are an opportunity to get together with other people doing search.",
"created_on": "2009-11-25",
"tags": ["enterprise search", "apache lucene", "solr", "open source", "text analytics"],
"members": ["Clint", "James"],
"location_group": "London, England, UK"
}'
echo
echo "Done indexing groups."
echo "Indexing events..."
curl -s -XPOST "$ADDRESS/get-together/_doc/100?routing=1" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "1"
},
"host": ["Lee", "Troy"],
"title": "Liberator and Immutant",
"description": "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
"attendees": ["Lee", "Troy", "Daniel", "Tom"],
"date": "2013-09-05T18:00",
"location_event": {
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
},
"reviews": 4
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/101?routing=1" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "1"
},
"host": "Sean",
"title": "Sunday, Surly Sunday",
"description": "Sort out any setup issues and work on Surlybird issues. We can use the EC2 node as a bounce point for pairing.",
"attendees": ["Daniel", "Michael", "Sean"],
"date": "2013-07-21T18:30",
"location_event": {
"name": "IRC, #denofclojure"
},
"reviews": 2
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/102?routing=1" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "1"
},
"host": "Daniel",
"title": "10 Clojure coding techniques you should know, and project openbike",
"description": "What are ten Clojure coding techniques that you wish everyone knew? We will also check on the status of Project Openbike.",
"attendees": ["Lee", "Tyler", "Daniel", "Stuart", "Lance"],
"date": "2013-07-11T18:00",
"location_event": {
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
},
"reviews": 3
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/103?routing=2" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "2"
},
"host": "Lee",
"title": "Introduction to Elasticsearch",
"description": "An introduction to ES and each other. We can meet and greet and I will present on some Elasticsearch basics and how we use it.",
"attendees": ["Lee", "Martin", "Greg", "Mike"],
"date": "2013-04-17T19:00",
"location_event": {
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
},
"reviews": 5
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/104?routing=2" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "2"
},
"host": "Lee",
"title": "Queries and Filters",
"description": "A get together to talk about different ways to query Elasticsearch, what works best for different kinds of applications.",
"attendees": ["Lee", "Greg", "Richard"],
"date": "2013-06-17T18:00",
"location_event": {
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
},
"reviews": 1
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/105?routing=2" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "2"
},
"host": "Lee",
"title": "Elasticsearch and Logstash",
"description": "We can get together and talk about Logstash - http://logstash.net with a sneak peek at Kibana",
"attendees": ["Lee", "Greg", "Mike", "Delilah"],
"date": "2013-07-17T18:30",
"location_event": {
"name": "Stoneys Full Steam Tavern",
"geolocation": "39.752337,-105.00083"
},
"reviews": null
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/106?routing=3" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "3"
},
"host": "Mik",
"title": "Social management and monitoring tools",
"description": "Shay Banon will be there to answer questions and we can talk about management tools.",
"attendees": ["Shay", "Mik", "John", "Chris"],
"date": "2013-03-06T18:00",
"location_event": {
"name": "Quid Inc",
"geolocation": "37.798442,-122.399801"
},
"reviews": 5
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/107?routing=3" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "3"
},
"host": "Mik",
"title": "Logging and Elasticsearch",
"description": "Get a deep dive for what Elasticsearch is and how it can be used for logging with Logstash as well as Kibana!",
"attendees": ["Shay", "Rashid", "Erik", "Grant", "Mik"],
"date": "2013-04-08T18:00",
"location_event": {
"name": "Salesforce headquarters",
"geolocation": "37.793592,-122.397033"
},
"reviews": 3
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/108?routing=3" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "3"
},
"host": "Elyse",
"title": "Piggyback on Elasticsearch training in San Francisco",
"description": "We can piggyback on training by Elasticsearch to have some Q&A time with the ES devs",
"attendees": ["Shay", "Igor", "Uri", "Elyse"],
"date": "2013-05-23T19:00",
"location_event": {
"name": "NoSQL Roadshow",
"geolocation": "37.787742,-122.398964"
},
"reviews": 5
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/109?routing=4" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "4"
},
"host": "Andy",
"title": "Hortonworks, the future of Hadoop and big data",
"description": "Presentation on the work that hortonworks is doing on Hadoop",
"attendees": ["Andy", "Simon", "David", "Sam"],
"date": "2013-06-19T18:00",
"location_event": {
"name": "SendGrid Denver office",
"geolocation": "39.748477,-104.998852"
},
"reviews": 2
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/110?routing=4" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "4"
},
"host": "Andy",
"title": "Big Data and the cloud at Microsoft",
"description": "Discussion about the Microsoft Azure cloud and HDInsight.",
"attendees": ["Andy", "Michael", "Ben", "David"],
"date": "2013-07-31T18:00",
"location_event": {
"name": "Bing Boulder office",
"geolocation": "40.018528,-105.275806"
},
"reviews": 1
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/111?routing=4" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "4"
},
"host": "Andy",
"title": "Moving Hadoop to the mainstream",
"description": "Come hear about how Hadoop is moving to the main stream",
"attendees": ["Andy", "Matt", "Bill"],
"date": "2013-07-21T18:00",
"location_event": {
"name": "Courtyard Boulder Louisville",
"geolocation": "39.959409,-105.163497"
},
"reviews": 4
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/112?routing=5" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "5"
},
"host": "Dave Nolan",
"title": "real-time Elasticsearch",
"description": "We will discuss using Elasticsearch to index data in real time",
"attendees": ["Dave", "Shay", "John", "Harry"],
"date": "2013-02-18T18:30",
"location_event": {
"name": "SkillsMatter Exchange",
"geolocation": "51.524806,-0.099095"
},
"reviews": 3
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/113?routing=5" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "5"
},
"host": "Dave",
"title": "Elasticsearch at Rangespan and Exonar",
"description": "Representatives from Rangespan and Exonar will come and discuss how they use Elasticsearch",
"attendees": ["Dave", "Andrew", "David", "Clint"],
"date": "2013-06-24T18:30",
"location_event": {
"name": "Alumni Theatre",
"geolocation": "51.51558,-0.117699"
},
"reviews": 3
}'
echo
curl -s -XPOST "$ADDRESS/get-together/_doc/114?routing=5" -H'Content-Type: application/json' -d'{
"relationship_type": {
"name": "event",
"parent": "5"
},
"host": "Yann",
"title": "Using Hadoop with Elasticsearch",
"description": "We will walk through using Hadoop with Elasticsearch for big data crunching!",
"attendees": ["Yann", "Bill", "James"],
"date": "2013-09-09T18:30",
"location_event": {
"name": "SkillsMatter Exchange",
"geolocation": "51.524806,-0.099095"
},
"reviews": 2
}'
echo
echo "Done indexing events."
# Refresh so data is available
curl -s -XPOST "$ADDRESS/get-together/_refresh"
echo
echo "Done indexing data."
echo
echo
echo "Creating Templates."
curl -s -XPUT "http://$ADDRESS/_template/logging_index_all" -H'Content-Type: application/json' -d'{
"template" : "logstash-09-*",
"order" : 1,
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
},
"aliases" : { "november" : {} }
}'
echo
curl -s -XPUT "http://$ADDRESS/_template/logging_index" -H'Content-Type: application/json' -d '{
"template" : "logstash-*",
"order" : 0,
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
}
}'
echo
echo "Done Creating Templates."
echo
echo "Adding Dynamic Mapping"
curl -s -XDELETE "http://$ADDRESS/myindex" > /dev/null
curl -s -XPUT "http://$ADDRESS/myindex" -H'Content-Type: application/json' -d'
{
"mappings" : {
"my_type" : {
"dynamic_templates" : [{
"UUID" : {
"match" : "*_guid",
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
}]
}
}
}'
echo
echo "Done Adding Dynamic Mapping"
echo
echo "Adding Aliases"
curl -s -XDELETE "http://$ADDRESS/november_2014_invoices" > /dev/null
curl -s -XDELETE "http://$ADDRESS/december_2014_invoices" > /dev/null
curl -s -XPUT "http://$ADDRESS/november_2014_invoices"
echo
curl -s -XPUT "http://$ADDRESS/december_2014_invoices" -H'Content-Type: application/json' -d'
{
"mappings" :
{
"invoice" :
{
"properties" :
{
"revenue" : { "type" : "integer" }
}
}
}
}'
echo
curl -s -XPOST "http://$ADDRESS/_aliases" -H'Content-Type: application/json' -d'
{
"actions" : [
{"add" : {"index" : "november_2014_invoices", "alias" : "2014_invoices"}},
{"add" : {"index" : "december_2014_invoices", "alias" : "2014_invoices"}},
{"remove" : {"index" : "myindex", "alias" : "december_2014_invoices"}}
]
}'
echo
echo "Done Adding Aliases"
echo "Adding Filter Alias"
curl -s -XPOST "http://$ADDRESS/_aliases" -H'Content-Type: application/json' -d '
{
"actions" : [
{
"add" : {
"index" : "december_2014_invoices",
"alias" : "bigmoney",
"filter" :
{
"range" :
{
"revenue" :
{
"gt" : 1000
}
}
}
}
}
]
}'
echo
echo "Done Adding Filter Alias"
echo
echo "Adding Routing Alias"
curl -s -XPOST "http://$ADDRESS/_aliases" -H'Content-Type: application/json' -d '
{
"actions" : [
{
"add" : {
"index" : "december_2014_invoices",
"alias" : "2014_invoices",
"search_routing" : "en,es",
"index_routing" : "en"
}
}
]
}'
echo
echo "Done Adding Routing Alias"
执行报错
mapper_parsing_exception","reason":"failed to parse field [relationship_type] of type [text] in document with id '100'
git下载7.0 branch可以解决
git clone https://github.com/dakrone/elasticsearch-in-action.git -b 7.x

指定索引或类型名称限制范围
% curl 'localhost:9200/_search' -d '……' ←——搜索整个集群
% curl 'localhost:9200/get-together/_search' -d '……' ←——搜索get-together 索引
% curl 'localhost:9200/get-together/event/_search' -d '……' ←——在get-together索引中搜索事件类型
% curl 'localhost:9200/_all/event/_search' -d '……' ←——在全部索引中搜索所有的事件类型
% curl 'localhost:9200/*/event/_search' -d '……'
% curl 'localhost:9200/get-together,other/event,group/_search' -d '……' ←——在get-together 和其他索引中搜索事件和分组类型
% curl 'localhost:9200/+get-toge*,-get-together/_search' -d '……' ←——搜索所有名字以get-toge开头的索引,但是不包括get-together
query:模块使用查询DSL和过滤器DSL来配置size:代表了返回文档的数量from:和size一起使用,from用于分页操作。需要注意的是,为了确定第2页的10项结果,Elasticsearch必须要计算前20个结果。如果结果集合不断增加,获取某些靠后的翻页将会成为代价高昂的操作_source:指定_source字段如何返回,如果索引的文档很大,而且无须结果中的全部内容,就使用这个功能% curl 'localhost:9200/get-together/_search?from=10&size=10' ←——请求匹配了所有文档,URL 中发送了from 和size 参数
% curl 'localhost:9200/get-together/_search?sort=date:asc' ←——请求匹配了所有文档,但是默认返回前10项结果,按照日期的升序排列
% curl 'node1:9200/get-together/_search?sort=date:asc&_source=title,date'
% curl 'node1:9200/get-together/_search?sort=date:asc&q=title:elasticsearch' ←——请求匹配了所有标题中含有“elasticsearch”字样的活动
curl 'localhost:9200/get-together/_search' -d '{
"query":{
"match_all":{}
},
"from":10, ←——返回从第10 项开始的结果
"size":10 ←——总共返回最多10个结果
}'
curl -X POST "node1:9200/get-together/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": { }
},
"from":10,
"size":10
}'
% curl 'localhost:9200/_search?q=title:elasticsearch&_source=title,date'
{
"took":2, ←——查询所用的毫秒数
"timed_out":false, ←——表明是否有分片超时,也就是说是否只返回了部分结果
"_shards":{
"total":2, ←——成功响应该请求和未能成功响应该请求的分片数量
"successful":2,
"failed":0
},
"hits":{ ←——回复中包含了命中( hits )的键,其值是命中文档的数组
"total":7, ←——该搜索请求所有匹配结果的数量
"max_score":0.9904146, ←——这个搜索结果中的最大得分
"hits":[ ←——命中(hits)关键词元素中的命中文档数组是否只返回了部分结果
"_shards":{
"total":2, ←——成功响应该请求和未能成功响应该请求的分片数量
"successful":2,
"failed":0
},
"hits":{ ←——回复中包含了命中( hits )的键,其值是命中文档的数组
"total":7, ←——该搜索请求所有匹配结果的数量
"max_score":0.9904146, ←——这个搜索结果中的最大得分
"hits":[ ←——命中(hits)关键词元素中的命中文档数组
{
"_index":"get-together", ←——结果文档的索引
"_type":"event", ←——结果文档的Elasticsearch 类型
"_id":"103", ←——结果文档的ID
"_score":0.9904146, ←——结果的相关性得分
"_source":{
"date":"2013-04-17T19:00", ←——请求的_source 字段(本例中是标题和日期)
"title":"Introduction to Elasticsearch"
}
},
{
"_index":"get-together",
"_type":"event",
"_id":"105",
"_score":0.9904146,
"_source":{
"date":"2013-07-17T18:30",
"title":"Elasticsearch and Logstash"
}
},
… ←——其他的命中结果,为了简洁这里略去
]
}
}
使用请求主体的基本搜索请求
% curl 'localhost:9200/get-together/_search' -d '{
"query":{ ←——搜索API 中的查询模块
"match_all":{} ←——查询 API 的基本样例
}
}'
% curl 'localhost:9200/get-together/event/_search' -d '{
"query":{
"match":{ ←——match 查询展示了如何搜索标题中有“hadoop”字样的活动
"title":"hadoop" ←——注意查询单词“Hadoop”是以小写的h 开头
}
}
}'
{
"bool": {
"must": { "match": { "title": "how to make millions" }},
"must_not": { "match": { "tag": "spam" }},
"should": [
{ "match": { "tag": "starred" }},
{ "range": { "date": { "gte": "2014-01-01" }}}
]
}
}
# 查询所有数据,筛选出工资等于6666或者7777的数据
GET 51jobs/job/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"terms": {
"salary": [6666,7777]
}
}
}
}
}
# 查询salary等于6666或者title等于python、salary不等于7777、salary不等于8888
GET 51jobs/job/_search
{
"query": {
"bool": {
"should": [
{"term": {
"salary": {
"value": 6666
}
}},
{"term": {
"title": {
"value": "python"
}
}}
],
"must_not": [
{"term": {
"salary": {
"value": 7777
}
}},
{"term": {
"salary": {
"value": 8888
}
}}
]
}
}
}
GET 51jobs/job/_search
{
"query": {
"bool": {
"must_not": [
{"tile": ""}
]
}
}
POST user_onoffline_log/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"uid_aggs": {
"cardinality": {
"field": "uid"
}
}
}
}
POST /user_onoffline_log/
{
"query": {
"match_all": {}
},
"aggs": {
"uid_aggs": {
"terms": {
"field": "uid",
"size": 1
},
"aggs": {
"uid_top": {
"top_hits": {
"sort": [
{
"uid": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
},
"size": 0
}
{
"query": {
"bool": {
"must": [
{
"term": {
"name": "自动封禁_安全事件_大于阈值_挡板"
}
},
{
"range": {
"start_time": {
"gte": 1609387208000
}
}
},
{
"exists": {
"field": "end_time"
}
}
]
}
},
"aggs": {
"sum_end_time": {
"sum": {
"field": "end_time"
}
},
"sum_start_time": {
"sum": {
"field": "start_time"
}
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_all": { }
}
],
"must_not": [ ],
"should": [ ]
}
},
"from": 0,
"size": 10,
"sort": [ ],
"aggs": { }
}
{
"query": {
"match_all": {}
},
"aggs": {
"name_aggs": {
"cardinality": {
"field": "name"
}
}
}
{
"query": {
"match_all": {}
},
"aggs": {
"name_aggs": {
"terms": {
"field": "name"
},
"aggs": {
"name_top": {
"top_hits": {
"sort": [
{
"name": {
"order": "desc"
}
}
]
}
}
}
}
}
}
POST https://215.9.167.42/es/soar_case_202012/_search
{
"query": {
"match_all": {}
},
"_source": [
"name",
"duration",
"plain_id"
]
}
{
"query": {
"match_all": {}
},
"_source": [
"inputs.value.attacker_array",
"inputs.value.alarm_source",
"inputs.value.victim_array",
"plain_id"
],
"size": 100
}
curl XPOST 'abdi.node49:9200/das_logger-v6-2021_09_08d,das_logger-v6-2021_09_07d,das_logger-v6-2021_09_13d,das_logger-v6-2021_09_12d/_search' -H 'Content-Type: application/json' -d '
{
"from": 0,
"size": 0,
"_source": {
"includes": [
"devSubType",
"deviceName"
],
"excludes": []
},
"stored_fields": [
"devSubType",
"deviceName"
],
"aggregations": {
"devSubType": {
"terms": {
"field": "devSubType",
"size": 1000,
"shard_size": 20000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"deviceName": {
"terms": {
"field": "deviceName",
"size": 1000,
"shard_size": 20000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}'
from-size分页的缺点
GET /{index_name}/_search
{
"from":0,
"size":10
}
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from((page.getPageNum() - 1) * page.getPageSize());
searchSourceBuilder.size(page.getPageSize());
scroll
我们可以给初始化查询传递参数scroll=5m ,es会返回一个_scroll_id,这是一个base64编码的长字符串,用于下次查询时传入。5m表示_scroll_id缓存5分钟,之后自动过期,可以根据需要配置。size可以指定每次滚动拉取多少数据。不过如果你做了分片,查询结果可能超过指定的 size 大小。案例如下
例如:第一次查询
GET /sms/_search?scroll=5m
{
"size": 20,
"query": {
"bool": {
"must": [
{
"match": {
"userId": "9d995c0b90fe4128896a1a84eca213bf"
}
}
]
}
}
}
返回:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBgAAAAAATJH1FlFTYzlSZ0VNVGdlM2o0T0dTX2tVUncAAAAAAE0-zBZQUVp6Sy04X1J1NjJCaVZfQUhHWjFnAAAAAABMkfYWUVNjOVJnRU1UZ2UzajRPR1Nfa1VSdwAAAAAATXVxFk83UWRhNGg3UmxTQnpXTEUzd0dreXcAAAAAAEyR9xZRU2M5UmdFTVRnZTNqNE9HU19rVVJ3AAAAAABNPs0WUFFaekstOF9SdTYyQmlWX0FIR1oxZw==",
"took": 6,
......
}
之后我们把上一次得到的_scroll_id拿到按以下查询即可得到下一轮的数据。
GET /_search/scroll/
{
"scroll":"1m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoBgAAAAAATJH1FlFTYzlSZ0VNVGdlM2o0T0dTX2tVUncAAAAAAE0-zBZQUVp6Sy04X1J1NjJCaVZfQUhHWjFnAAAAAABMkfYWUVNjOVJnRU1UZ2UzajRPR1Nfa1VSdwAAAAAATXVxFk83UWRhNGg3UmxTQnpXTEUzd0dreXcAAAAAAEyR9xZRU2M5UmdFTVRnZTNqNE9HU19rVVJ3AAAAAABNPs0WUFFaekstOF9SdTYyQmlWX0FIR1oxZw=="
}
除了等待scroll_id过期时间之外,我们也可以手动删除scroll_id:
// 手动删除scroll_id的方法
DELETE /_search/scroll { "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBgAAAAAATJH1FlFTYzlSZ0VNVGdlM2o0T0dTX2tVUncAAAAAAE0-zBZQUVp6
修改max_result_window
curl -XPUT http://es-ip:9200/_settings -d '{ "index" : { "max_result_window" : 100000}}
#设置单独索引
curl -XPUT http://ip:port/索引名称/_settings -H ‘Content-Type:application/json’ -d ‘{“index” : {“max_result_window” : 所要返回的数据量大小}}’
GET 索引名/_search
{
"query": {
"match_all": {}
},
"track_total_hits":true
}
2020-07-30 00:12:48,712 main ERROR Unable to locate appender "rolling" for logger config "root"
解决方法:需要修改config配置里的log4j2.properties 文件, 将 logger.deprecation.level = warn 改为 error
java.io.FileNotFoundException: /opt/elasticsearch/logs/es.log (Permission denied) \

参考
观察:_cat/fielddata?v
1)INDEX_CREATED:由于创建索引的API导致未分配。
2)CLUSTER_RECOVERED :由于完全集群恢复导致未分配。
3)INDEX_REOPENED :由于打开open或关闭close一个索引导致未分配。
4)DANGLING_INDEX_IMPORTED :由于导入dangling索引的结果导致未分配。
5)NEW_INDEX_RESTORED :由于恢复到新索引导致未分配。
6)EXISTING_INDEX_RESTORED :由于恢复到已关闭的索引导致未分配。
7)REPLICA_ADDED:由于显式添加副本分片导致未分配。
8)ALLOCATION_FAILED :由于分片分配失败导致未分配。
9)NODE_LEFT :由于承载该分片的节点离开集群导致未分配。
10)REINITIALIZED :由于当分片从开始移动到初始化时导致未分配(例如,使用影子shadow副本分片)。
11)REROUTE_CANCELLED :作为显式取消重新路由命令的结果取消分配。
12)REALLOCATED_REPLICA :确定更好的副本位置被标定使用,导致现有的副本分配被取消,出现未分配。
使用Cluster Allocation Explaine API
GET /_cluster/allocation/explain
{
"index": "myindex",
"shard": 0,
"primary": true
}
#You may also specify an optional current_node request parameter to only explain a shard that is currently located on current_node. The current_node can be specified as either the node id or node name.
GET /_cluster/allocation/explain
{
"index": "myindex",
"shard": 0,
"primary": false,
"current_node": "nodeA"
}
The api response for an unassigned shard
{
"index" : "idx",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned", #分片的当前状态
"unassigned_info" : {
"reason" : "INDEX_CREATED", # 未分片原因
"at" : "2017-01-04T18:08:16.600Z",
"last_allocation_status" : "no"
},
"can_allocate" : "no", # 是否分配分片
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
"node_name" : "node-0",
"transport_address" : "127.0.0.1:9401",
"node_attributes" : {},
"node_decision" : "no", # 是否分配这个分片到确切的节点
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "filter", #导致节点无决策的决策者
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"non_existent_node\"]"
}
]
}
]
}
查看未分片

curl -H "Content-Type: application/json" node1:9200/_cluster/allocation/explain?pretty -d '{"index":
"{index}","shard": {shard},"primary": false}'
curl -XGET node1:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
# 解决方法
curl -XDELETE '192.168.199.136:9200/index_name/'
allocation failed
POST _cluster/reroute?retry_failed
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "rs_wx_test",
"shard" : 1,
"node" : "AfUyuXmGTESHXpwi4OExxx",
"allow_primary" : true
}
}
] }'
for index in $(curl -XGET 'http://localhost:9200/_cat/shards' | grep UNASSIGNED |awk '{print $1}'|sort |uniq):do
for shards in $(curl -XGET 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | grep $index | awk '{print $2}'|sort|uniq):do
curl XPOST 'http://localhost:9200/_cluster/reroute'-d '{
"commands":[
{
"allocate":{
"index":$index,
"shards":$shards,
"node":"ali-k-ops-elk1",
"allow_primary":"true"
}
}
]
}'
done
done
_doc"Document mapping type name can't start with '_', found: [_doc]"
fulltext或者article替代relationship_type{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse [relationship_type]"}],"type":"mapper_parsing_exception","reason":"failed to parse [relationship_type]","caused_by":{"type":"illegal_state_exception","reason":"Can't get text on a START_OBJECT at 2:24"}},"status":400}
curl -XPUT localhost:9200/*/_settings -H 'Content-Type: application/json' -d '{"settings" : {"index" : {"number_of_replicas" : 0 }}}'
设置所有分片副本
grep "\"number_of_replicas\": 0" |xargs sed -i 's/\"number_of_replicas\": 0/\"number_of_replicas\": 1/
curl -H "Content-Type:application/json" -X PUT "http://node2:9200/_all/_settings" -d '{"index":{"blocks":{"read_only_allow_delete": "false"}}}'
https://my.oschina.net/u/4277979/blog/4719417
npm install elasticdump -g)https://github.com/medcl/esm)1)把要迁移的7台ES节点分片数据迁移到其他5台物理服务器,需要迁移物理服务器为:
10.253.79.149
10.253.79.150
10.253.79.155
10.253.79.156
10.253.79.157
10.253.79.158
10.253.79.159
其中10.253.79.149,10.253.79.150两台服务器为512G,80C,性能较高,需做SAE和ICE。
a) 在ES主节点上设置不往10.253.79.149,10.253.79.150,10.253.79.155,10.253.79.155,10.253.79.157,10.253.79.158,10.253.79.159上写入数据并把此7台节点的索引分片分到不迁移的5台上。
命令:
curl -XPUT localhost:9200/_cluster/settings -d '{"transient" : {"cluster.routing.allocation.exclude._ip" : "10.253.79.149,10.253.79.150,10.253.79.155,10.253.79.155,10.253.79.157,10.253.79.158,10.253.79.159"}}'
b) 此时用cerebro查看集群分片情况:
此时分片还未迁移完成,分片还存在被移除的节点上

迁移期间需要等待几小时(时间根据磁盘容量决定)
c) 迁移完成查看节点分片信息
查询被迁移节点分片数据不存在

期间需注意关闭的索引节点分片信息
在cerebro里点击closed

d) 确认数据节点分片迁移全部完成后,关闭需要下线的7台ES服务器
supervisorctl stop elasticSearch
e) 关闭7台服务器后,查询ES集群状态
节点分片全部迁移完成

f) 修改未迁移5台ES服务器配置(5台都修改)下方列出了其中一台配置。(防止服务器宕机重新加载原来12台集群配置,修改为5台)
配置文件路径 /opt/hansight/enterprise/elasticsearch/config/
Vi elasticsearch.yml中的
discovery.zen.ping.unicast.hosts: [ "10.253.79.145:9300","10.253.79.151:9300","10.253.79.152:9300","10.253.79.153:9300","10.253.79.154:9300" ] //剩下5台的IP和端口
目标ES集群的主版本号(如5.6.4中的5为主版本号)要大于等于源ES集群的主版本号;
1.x版本的集群创建的快照不能在5.x版本中恢复;
源ES集群中创建repository
创建快照前必须先创建repository仓库,一个repository仓库可以包含多份快照文件,repository主要有一下几种类型
fs: 共享文件系统,将快照文件存放于文件系统中
url: 指定文件系统的URL路径,支持协议:http,https,ftp,file,jar
s3: AWS S3对象存储,快照存放于S3中,以插件形式支持
hdfs: 快照存放于hdfs中,以插件形式支持
cos: 快照存放于腾讯云COS对象存储中,以插件形式支持
如果需要从自建ES集群迁移至腾讯云的ES集群,可以直接使用fs类型仓库,注意需要在Elasticsearch配置文件elasticsearch.yml设置仓库路径:
path.repo: ["/usr/local/services/test"]
之后调用snapshot api创建repository:
curl -XPUT http://172.16.0.39:9200/_snapshot/my_backup -H 'Content-Type: application/json' -d '{
"type": "fs",
"settings": {
"location": "/usr/local/services/test"
"compress": true
}
}'
如果需要从其它云厂商的ES集群迁移至腾讯云ES集群,或者腾讯云内部的ES集群迁移,可以使用对应云厂商他提供的仓库类型,如AWS的S3, 阿里云的OSS,腾讯云的COS等
curl -XPUT http://172.16.0.39:9200/_snapshot/my_s3_repository
{
"type": "s3",
"settings": {
"bucket": "my_bucket_name",
"region": "us-west"
}
}
源ES集群中创建snapshot
调用snapshot api在创建好的仓库中创建快照
curl -XPUT http://172.16.0.39:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true
创建快照可以指定索引,也可以指定快照中包含哪些内容,具体的api接口参数可以查阅官方文档
目标ES集群中创建repository
目标ES集群中创建仓库和在源ES集群中创建仓库类似,用户可在腾讯云上创建COS对象bucket, 将仓库将在COS的某个bucket下。
移动源ES集群snapshot至目标ES集群的仓库
把源ES集群创建好的snapshot上传至目标ES集群创建好的仓库中
从快照恢复
curl -XPUT http://172.16.0.20:9200/_snapshot/my_backup/snapshot_1/_restore
查看快照恢复状态
curl http://172.16.0.20:9200/_snapshot/_status
select event_type, event_name, count(*) as num from event group by event_name,event_type order by num desc limit 3
{
"from": 0,
"size": 0,
"_source": {
"includes": [
"event_type",
"event_name",
"COUNT"
],
"excludes": []
},
"stored_fields": [
"event_type",
"event_name"
],
"aggregations": {
"event_name": {
"terms": {
"field": "event_name",
"size": 3,
"shard_size": 5000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"event_type": {
"terms": {
"field": "event_type",
"size": 3,
"shard_size": 5000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"num": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"num": {
"value_count": {
"field": "_index"
}
}
}
}
}
}
}
}
select devSubType,deviceName from das_logger group by devSubType,deviceName
{
"from": 0,
"size": 0,
"_source": {
"includes": [
"devSubType",
"deviceName"
],
"excludes": []
},
"stored_fields": [
"devSubType",
"deviceName"
],
"aggregations": {
"devSubType": {
"terms": {
"field": "devSubType",
"size": 1000,
"shard_size": 20000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"deviceName": {
"terms": {
"field": "deviceName",
"size": 1000,
"shard_size": 20000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}
#!/bin/bash
# Remove the index of one month old in elasticserch
CMD_ECHO='echo'
SCRIPT_NAME=`basename $0`
LOG_PRINT="eval $CMD_ECHO \"[$SCRIPT_NAME]\" @$(date +"%Y%m%d %T") [INFO] :"
time_ago=7
es_cluster_ip=10.26.22.130
function delete_index(){
comp_date=`date -d "${time_ago} day ago" +"%Y-%m-%d"`
date1="${1} 00:00:00"
date2="${comp_date} 00:00:00"
index_date=`date -d "${date1}" +%s`
limit_date=`date -d "${date2}" +%s`
if [ $index_date -le $limit_date ];then
$LOG_PRINT "$1 will perform the delete task earlier than ${time_ago} days ago" >> tmp.txt
del_date=`echo $1 | awk -F "-" '{print $1"."$2"."$3}'`
echo "=========开始删除========="
curl -XDELETE http://${es_cluster_ip}:9200/devlog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/devbacklog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/testlog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/testbacklog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/uatbacklog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/uatlog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/prodlog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/prodbacklog-$del_date >> tmp.txt
curl -XDELETE http://${es_cluster_ip}:9200/alllogback-$del_date >> tmp.txt
fi
}
# get the date in all index
curl -XGET http://${es_cluster_ip}:9200/_cat/indices|awk -F " " '{print $3}' | egrep "[0-9]*\.[0-9]*\.[0-9]*" |awk -F "-" '{print $NF}' | awk -F "." '{print $((NF-2))"-"$((NF-1))"-"$NF}' | sort | uniq | while read LINE
do
delete_index ${LINE}
done