xinference-local --host 0.0.0.0 --port 9997
localhost:9997中,通过左侧的Launch Model 配置并下载模型cached字样
xinference-local --host 0.0.0.0 --port 9997
xinference launch --model-engine vLLM --model-name chatglm3 --size-in-billions 6 --model-format pytorch --quantization none

Xinference设置的软链接
~/.xinference/cache对应的文件夹中~/.cache对应的文件夹中注意:在~/.xinference/cache/model_name中,多了一个__valid_download的文件
该文件的内容包含如下,可能是用于后台检测,模型是否已经有效下载
{"model_type": "LLM", "address": null, "accelerators": null, "model_name": "chatglm3", "model_lang": ["en", "zh"], "model_ability": ["chat", "tools"], "model_description": "ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.", "model_format": "pytorch", "model_size_in_billions": 6, "model_family": "chatglm3", "quantization": "none", "model_hub": "huggingface", "revision": "103caa40027ebfd8450289ca2f278eac4ff26405", "context_length": 8192}(xinference)

http://localhost:9997/ui/#/running_models/LLM

ln -s ~/Downloads/chatglm3-6b ~/.xinference/cache/chatglm3-pytorch-6b
chatglm3-pytorch-6b-rawcp ~/.xinference/cache/chatglm3-pytorch-6b-raw/__valid_download ~/.xinference/cache/chatglm3-pytorch-6b/__valid_download
ln -s ~/Downloads/chatglm3-6b ~/.xinference/cache/chatglm3-pytorch-6b__valid_download文件?如果没有这个文件,Xinference在执行下述命令后,似乎还是会继续默认的下载xinference launch --model-engine vLLM --model-name chatglm3 --size-in-billions 6 --model-format pytorch --quantization none
pip install transformers==4.21.2 # Default 4.42.3