• 使用ElasticSearch 和 BERT进行NLP文本分析


    文章大纲


    es 8.0 新特性

    https://www.elastic.co/cn/blog/whats-new-elastic-8-0-0

    新版es 新增的 机器学习 算法(比如异常检测

    • https://www.elastic.co/guide/en/machine-learning/current/anomaly-examples.html

    wsl2 下使用 docker 搞一下es

    如何在wsl2 下面安装docker,可以参考我之前的博客

    拉取 Elasticsearch Docker image

    Obtaining Elasticsearch for Docker is as simple as issuing a docker pull command against the Elastic Docker registry.

    docker pull docker.elastic.co/elasticsearch/elasticsearch:8.2.0
    
    • 1

    启动单个 ES 节点

    Start a single-node cluster with Dockeredit
    If you’re starting a single-node Elasticsearch cluster in a Docker container, security will be automatically enabled and configured for you. When you start Elasticsearch for the first time, the following security configuration occurs automatically:

    Certificates and keys are generated for the transport and HTTP layers.
    The Transport Layer Security (TLS) configuration settings are written to elasticsearch.yml.
    A password is generated for the elastic user.
    An enrollment token is generated for Kibana.
    You can then start Kibana and enter the enrollment token, which is valid for 30 minutes. This token automatically applies the security settings from your Elasticsearch cluster, authenticates to Elasticsearch with the kibana_system user, and writes the security configuration to kibana.yml.

    The following commands start a single-node Elasticsearch cluster for development or testing.

    Create a new docker network for Elasticsearch and Kibana

    docker network create elastic
    Start Elasticsearch in Docker. A password is generated for the elastic user and output to the terminal, plus an enrollment token for enrolling Kibana.

    docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.2.0
    You might need to scroll back a bit in the terminal to view the password and enrollment token.

    Copy the generated password and enrollment token and save them in a secure location. These values are shown only when you start Elasticsearch for the first time.

    If you need to reset the password for the elastic user or other built-in users, run the elasticsearch-reset-password tool. This tool is available in the Elasticsearch /bin directory of the Docker container. For example:

    docker exec -it es01 /usr/share/elasticsearch/bin/elasticsearch-reset-password
    Copy the http_ca.crt security certificate from your Docker container to your local machine.

    docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
    Open a new terminal and verify that you can connect to your Elasticsearch cluster by making an authenticated call, using the http_ca.crt file that you copied from your Docker container. Enter the password for the elastic user when prompted.

    curl --cacert http_ca.crt -u elastic https://localhost:9200

    使用docker 安装 es

    主体参考:

    • https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

    docker官方的镜像库比较慢,在进行镜像操作之前,需要将镜像源设置为国内的站点。

    新建文件/etc/docker/daemon.json,输入如下内容:

    {
        "registry-mirrors" : [
            "https://registry.docker-cn.com",
            "https://docker.mirrors.ustc.edu.cn",
            "http://hub-mirror.c.163.com",
            "https://cr.console.aliyun.com/"
        ]
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    然后重启docker的服务:

    systemctl restart docker
    
    • 1

    早期版本方案 bert-server

    https://towardsdatascience.com/elasticsearch-meets-bert-building-search-engine-with-elasticsearch-and-bert-9e74bf5b4cf2

    在这里插入图片描述

    https://github.com/Hironsan/bertsearch


    Es 8.0 版本方案

    未完待续

    es 与 nlp

    https://www.elastic.co/guide/en/machine-learning/master/ml-nlp.html


    参考文献

    Introduction to modern natural language processing with PyTorch in Elasticsearch

    • https://www.elastic.co/cn/blog/introduction-to-nlp-with-pytorch-models
    • https://eland.readthedocs.io/en/v8.1.0/
  • 相关阅读:
    基于python的火车票售票系统/基于django火车票务网站/火车购票系统
    PRML 概率分布
    Voxel R-CNN:基于体素的高性能 3D 目标检测
    【OpenMMLab】AI实战营第二期Day5:MMPretrain代码课
    电脑麦克风没声音怎么办?3个方法快速解决
    质数和约数
    bootstrap-table固定右侧列+表头和内容对齐
    el-form动态表单嵌套验证
    linux批量解压zip
    Windows与网络基础-28-子网划分
  • 原文地址:https://blog.csdn.net/m0_67393593/article/details/126553697