在我们创建 Elasticsearch 进行开发时,最简单的办法就是在本地使用 docker-compose 来一键部署一个 Elasticsearch 集群。有时,特别是在准备测试环境时,开发人员希望从一开始就创建包含一些测试数据的数据库容器。我们可以使用 Logstash 来很方便地把数据写入到 Elasticsearch 中。

在我之前的文章 “Elasticsearch:使用 Docker-Compose 启动单节点 Elastic Stack”,我有讲到这个方法。在今天的文章中,我们通过另外一种方法来实现。你可以在地址 https://github.com/liu-xiao-guo/elasitcPreloadData 下载所有的代码。
首先,我们项目的根目录下创建一个 .env 的文件。
.env
- ELASTIC_PASSWORD=DEFAULT
- STACK_VERSION=7.17.14
- ES_PORT=9203
接下来创建 docker-compose.yaml 配置文件:
docker-compose.yaml
- version: "2.2"
- services:
- es01:
- image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
- ports:
- - ${ES_PORT}:9200
- environment:
- - node.name=es01
- - cluster.initial_master_nodes=es01
- - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- - bootstrap.memory_lock=true
- - xpack.security.enabled=true
- healthcheck:
- test:
- [
- "CMD-SHELL",
- "curl -s -k http://localhost:9200",
- ]
- interval: 10s
- timeout: 10s
- retries: 120
- logstash:
- build:
- context: logstash/
- dockerfile: Dockerfile
- depends_on:
- es01:
- condition: service_healthy
- environment:
- - ELASTICSEARCH_URL=http://es01:9200
- - ELASTICSEARCH_USERNAME=elastic
- - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- - XPACK_MONITORING_ENABLED=false
有几点需要注意:
Logstash 的 Dockerfile:
logstash/Dockerfile
- FROM docker.elastic.co/logstash/logstash:7.17.14
-
- COPY importData.conf /usr/share/logstash/pipeline
- RUN mkdir /usr/share/logstash/data-test/
- COPY testdata.json /usr/share/logstash/data-test/
- COPY --chmod=0755 progress.sh /tmp
- #Install exec plugin to run shell script in Logstash pipeline
- RUN bin/logstash-plugin install logstash-output-exec
-
- ENTRYPOINT ["/usr/local/bin/docker-entrypoint"]
JSON 数据文件应将每个文档包含为一行,如下所示:
- {"name": "Bobbie", "emailaddress": "Bob@mail2u.org", "address": "1186 Neil Court", "country": "UK", "birthdate": "1995-10-15T01:00:00Z",}
- {"name": "Helen", "emailaddress": "Hele@mail.ru", "address": "839 Federal Ridge", "country": "Hungary", "birthdate": "1985-11-03T01:00:00Z"}
要在 Logstash 中运行的管道配置文件应定义输入文件(我们的 JSON 数据测试文件)和输出(插入 Elasticsearch 并运行自定义脚本):
- input {
- file {
- path => "/usr/share/logstash/data-test/testdata.json"
- mode => "read"
- codec => json { }
- exit_after_read => true
- type => "sample"
- }
- }
- filter {
- mutate {
- remove_field => [ "log", "@timestamp", "event", "@version" ]
- }
- }
- output {
- elasticsearch {
- hosts => "${ELASTICSEARCH_URL}"
- index => "test_data"
- user => "elastic"
- password => "${ELASTIC_PASSWORD}"
- ssl_certificate_verification => false
- }
- exec {
- command => "/tmp/progress.sh"
- }
- }
Logstash 旨在成为一种监听连续输入流的服务。 通常停止它是没有意义的,因为新数据无论何时到来都应该通过管道进行处理。 然而在这种情况下,我只想 Logstash 导入我的测试数据,然后停止释放资源。
这是我在导入数据后终止 Logstash 容器的一种 hack:
- #!/bin/bash
-
- CHECK="$ELASTICSEARCH_URL/test_data/_count"
- #Expected data test size is 10 documents
- CONDITION="\"count\":10"
-
- while [ true ]
- do
- if curl -u $ELASTICSEARCH_USERNAME:$ELASTIC_PASSWORD $CHECK | grep -q "$CONDITION"; then
- #Kill Logstash service so container would stop
- kill $(ps aux | grep 'logstash' | awk '{print $2}')
- break
- else
- echo "Counting documents from Elasticsearch does not return the expected number. Retrying"
- sleep 2
- fi
- done
现在只需 docker-compose up -d ,大约 2 分钟后,Elasticsearch 就会启动并创建索引,其中包含一些文档。