• Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (三)


    这是继之前文章:

    的续篇。在今天的文章中,我将详述如何使用 ElasticsearchStore。这也是被推荐的使用方法。如果你还没有设置好自己的环境,请详细阅读第一篇文章。

    创建应用并展示

    安装包

    #!pip3 install langchain

    导入包

    1. from dotenv import load_dotenv
    2. from langchain.embeddings import OpenAIEmbeddings
    3. from langchain.vectorstores import ElasticsearchStore
    4. from langchain.text_splitter import CharacterTextSplitter
    5. from urllib.request import urlopen
    6. import os, json
    7. load_dotenv()
    8. openai_api_key=os.getenv('OPENAI_API_KEY')
    9. elastic_user=os.getenv('ES_USER')
    10. elastic_password=os.getenv('ES_PASSWORD')
    11. elastic_endpoint=os.getenv("ES_ENDPOINT")
    12. elastic_index_name='elasticsearch-store'

    添加文档并将文档分成段落

    1. with open('workplace-docs.json') as f:
    2. workplace_docs = json.load(f)
    3. print(f"Successfully loaded {len(workplace_docs)} documents")

    1. metadata = []
    2. content = []
    3. for doc in workplace_docs:
    4. content.append(doc["content"])
    5. metadata.append({
    6. "name": doc["name"],
    7. "summary": doc["summary"],
    8. "rolePermissions":doc["rolePermissions"]
    9. })
    10. text_splitter = CharacterTextSplitter(chunk_size=50, chunk_overlap=0)
    11. docs = text_splitter.create_documents(content, metadatas=metadata)

    把数据写入到 Elasticsearch

    1. from elasticsearch import Elasticsearch
    2. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    3. url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
    4. connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
    5. es = ElasticsearchStore.from_documents(
    6. docs,
    7. embedding = embeddings,
    8. es_url = url,
    9. es_connection = connection,
    10. index_name = elastic_index_name,
    11. es_user = elastic_user,
    12. es_password = elastic_password)

    展示结果

    1. def showResults(output):
    2. print("Total results: ", len(output))
    3. for index in range(len(output)):
    4. print(output[index])

    Similarity / Vector Search (Approximate KNN Search) - ApproxRetrievalStrategy()

    1. query = "work from home policy"
    2. result = es.similarity_search(query=query)
    3. showResults(result)

    Hybrid Search (Approximate KNN + Keyword Search) - ApproxRetrievalStrategy()

    我们在 Kibana 的 Dev Tools 里打入如下的命令:

    1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    2. es = ElasticsearchStore(
    3. es_url = url,
    4. es_connection = connection,
    5. es_user=elastic_user,
    6. es_password=elastic_password,
    7. embedding=embeddings,
    8. index_name=elastic_index_name,
    9. strategy=ElasticsearchStore.ApproxRetrievalStrategy(
    10. hybrid=True
    11. )
    12. )
    13. es.similarity_search("work from home policy")

    造成这个错误的原因是因为当前的 License 模式不支持 RRF。我们去 Kibana 启动当前的授权:

    我们再次运行代码:

    Exact KNN Search (Brute Force) - ExactRetrievalStrategy()

    1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    2. es = ElasticsearchStore(
    3. es_url = url,
    4. es_connection = connection,
    5. es_user=elastic_user,
    6. es_password=elastic_password,
    7. embedding=embeddings,
    8. index_name=elastic_index_name,
    9. strategy=ElasticsearchStore.ExactRetrievalStrategy()
    10. )
    11. es.similarity_search("work from home policy")

    Index / Search Documents using ELSER - SparseVectorRetrievalStrategy()

    在这个步骤中,我们需要启动 ELSER。有关 ELSER 的启动,请参阅文章 “Elasticsearch:部署 ELSER - Elastic Learned Sparse EncoderR”。

    1. embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    2. es = ElasticsearchStore.from_documents(
    3. docs,
    4. es_url = url,
    5. es_connection = connection,
    6. es_user=elastic_user,
    7. es_password=elastic_password,
    8. index_name=elastic_index_name+"-"+"elser",
    9. strategy=ElasticsearchStore.SparseVectorRetrievalStrategy()
    10. )
    11. es.similarity_search("work from home policy")

    在运行完上面的代码后,我们可以在 Kibana 中进行查看所生成的字段:

    上面代码的整个 jupyter notebook 可以在地址 https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore.ipynb 下载。

  • 相关阅读:
    每日一练——快速合并2个有序数组
    1.0 Zookeeper 教程
    RK3568-74HC595
    QCOM和其他常见芯片平台术语缩写
    Java.lang.Class类 getClassLoader()方法有什么功能呢?
    Failed to start sshd.service: Unit sshd.service not found
    unity面试八股文 - 基础篇
    2022年湖北省科技进步奖详细解答,该奖项申报条件以及奖励补贴具体情况解析
    go语言grpc的快速体验-grpc流模式
    数字图像处理(十)腐蚀和膨胀
  • 原文地址:https://blog.csdn.net/UbuntuTouch/article/details/134030755