GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.https://github.com/lm-sys/FastChat使用的开源项目是FastChat。
最近一直在使用 OpenAI 的 API 做一些学习和调研。使用 OpenAI 的 API,一是会产生费用,二是要解决网络问题,三是还有各种访问限速。
所以尝试使用开源大语言模型搭建一个本地的 OpenAI 的 API 服务。
此次使用的开源技术是 FastChat。
- conda create -n fastchat python==3.11 -y
- conda activate fastchat
- git clone https://github.com/lm-sys/FastChat.git
- pip install --upgrade pip
cd FastChat
- pip install -e ".[model_worker,webui]"
- pip install transformers_stream_generator
- pip install cpm_kernels
启动 controller,
python -m fastchat.serve.controller --host 0.0.0.0 --port 21001
启动 model worker(s),
python -m fastchat.serve.model_worker --model-path THUDM/chatglm2-6b
启动 model worker(s)完成后,启动 Gradio web server,
python -m fastchat.serve.gradio_web_server --host 0.0.0.0 --port 8888
问它几个问题,问题和答案截图如下,
启动 controller,
python -m fastchat.serve.controller --host 0.0.0.0 --port 21001
启动 model worker(s),
python -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path THUDM/chatglm2-6b
启动 RESTful API server,
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
设置 OpenAI base url,
export OPENAI_API_BASE=http://localhost:8000/v1
设置 OpenAI API key,
export OPENAI_API_KEY=EMPTY
- import os
- import openai
-
- from dotenv import load_dotenv, find_dotenv
- _ = load_dotenv(find_dotenv()) # read local .env file
- os.environ['OPENAI_API_KEY'] = 'EMPTY'
- os.environ['OPENAI_API_BASE'] = 'http://localhost:8000/v1'
- openai.api_key = 'none'
- openai.api_base = 'http://localhost:8000/v1'
- def get_completion(prompt, model="gpt-3.5-turbo"):
- messages = [{"role": "user", "content": prompt}]
- response = openai.ChatCompletion.create(
- model=model,
- messages=messages,
- temperature=0,
- )
- return response.choices[0].message["content"]
get_completion("你是谁?")
(可选)如果在创建嵌入时遇到 OOM 错误,请使用环境变量设置较小的 BATCH_SIZE,
export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1
(可选)如果遇到超时错误,
export FASTCHAT_WORKER_API_TIMEOUT=1200