新建weights文件夹,并下载到LLaVA/weights/中。->需要修改文件名为llava-版本,例如llava-v1.5-7b.
python -m llava.serve.controller --host 0.0.0.0 --port 4006
python -m llava.serve.gradio_web_server --controller http://localhost:4006 --model-list-mode reload --share
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:4006 --port 2006 --worker http://localhost:2006 --model-path /ark-local-data/licc/code/LLaVA/llava-v1.5-7b
运行时会加载"mm_vision_tower": “openai/clip-vit-large-patch14-336”,代码位于LLaVA/llava/model/multimodal_encoder/builder.py
,如下所示:
def build_vision_tower(vision_tower_cfg, **kwargs): # 自己加了print
print('vision_conf',vision_tower_cfg)
vision_tower = getattr(vision_tower_cfg, 'mm_vision_tower', getattr(vision_tower_cfg, 'vision_tower', None))
is_absolute_path_exists = os.path.exists(vision_tower)
# print('vision_tower',vision_tower, is_absolute_path_exists)
#/ark-local-data/licc/code/LLaVA/weights/clip-vit-large-patch14-336/ True
if is_absolute_path_exists or vision_tower.startswith("openai") or vision_tower.startswith("laion"):
return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
raise ValueError(f'Unknown vision tower: {vision_tower}')
由于网络不通,需要提前下载vit模型,并修改config.json中的vision路径:LLaVA/weights/config.json,mm_vision_tower:{vision_model_path}。具体如下所示:
"intermediate_size": 11008,
"max_length": 4096,
"max_position_embeddings": 4096,
"mm_hidden_size": 1024,
"mm_projector_type": "mlp2x_gelu",
"mm_resampler_type": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"mm_vision_tower": "/ark-local-data/licc/code/LLaVA/clip-vit-large-patch14-336",
"model_type": "llava",
"num_attention_heads": 32,
python -m llava.serve.cli --model-path /code/LLaVA/llava-v1.5-7b --image-file "https://llava-vl.github.io/static/images/view.jpg" --load-4bit
更新:
最近做vqa,玩了下llava13b和llava7b,发现13b生成text的性能好很多,有能力的可以试一下~