这是继上一篇文章 “Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (一)” 的续篇。在这篇文章中,我主要来讲述 ElasticVectorSearch 的使用。
我们的设置和之前的那篇文章是一样的,只不过,在这里我们使用 ElasticVectorSearch 而不是 ElasticKnnSearch。
#!pip3 install langchain
- from dotenv import load_dotenv
- from langchain.embeddings import OpenAIEmbeddings
- from langchain.vectorstores import ElasticKnnSearch
- from langchain.text_splitter import CharacterTextSplitter
- from urllib.request import urlopen
- import os, json
-
- load_dotenv()
-
- openai_api_key=os.getenv('OPENAI_API_KEY')
- elastic_user=os.getenv('ES_USER')
- elastic_password=os.getenv('ES_PASSWORD')
- elastic_endpoint=os.getenv("ES_ENDPOINT")
- elastic_index_name='elastic-vector-search'
- import json
-
-
- # Load data into a JSON object
- with open('workplace-docs.json') as f:
- workplace_docs = json.load(f)
-
- print(f"Successfully loaded {len(workplace_docs)} documents")
-
- metadata = []
- content = []
-
- for doc in workplace_docs:
- content.append(doc["content"])
- metadata.append({
- "name": doc["name"],
- "summary": doc["summary"],
- "rolePermissions":doc["rolePermissions"]
- })
-
- text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
- docs = text_splitter.create_documents(content, metadatas=metadata)
- embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
-
- url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
-
- ssl_verify = {
- "verify_certs": True,
- "basic_auth": (elastic_user, elastic_password),
- "ca_certs": "./http_ca.crt"
- }
-
- es = ElasticVectorSearch.from_documents(
- docs,
- embedding = embeddings,
- elasticsearch_url = url,
- index_name = elastic_index_name,
- ssl_verify = ssl_verify)
如上所示,ElasticVectorSearch 在未来的发布中将被移除。
运行完上面的代码后,我们可以到 Kibana 中进行查看:
- def showResults(output):
- print("Total results: ", len(output))
- for index in range(len(output)):
- print(output[index])
- query = "work from home policy"
- result = es.similarity_search(query=query)
-
- showResults(result)
我们上面实现的代码可以在地址 https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticVectorSearch.ipynb 进行下载。