--- license: apache-2.0 language: - en base_model: meta-llama/Meta-Llama-3.1-8B-Instruct pipeline_tag: image-text-to-text --- # Llama 3.1 Vision by Capx AI ![image/png](https://cdn-uploads.huggingface.co/production/uploads/644bf6ef778ecbfb977e8e84/3D-oR8GazhHTaA-kVLNDk.png) Read more on: https://huggingface.co/blog/adarshxs/capx-vision ## Directions to Run Inference: **Minimum requirements to run Inference is an A100 40GB GPU.** - Clone our fork of the Bunny by BAAI repository here: https://github.com/adarshxs/Capx-Llama3.1-Vision - Create a conda virtual environment ```bash conda create -n capx python=3.10 conda activate capx ``` - Install the following ```bash pip install --upgrade pip # enable PEP 660 support pip install transformers pip install torch torchvision xformers --index-url https://download.pytorch.org/whl/cu118 # Installing APEX pip install ninja git clone https://github.com/NVIDIA/apex cd apex pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ cd .. # Installing Flash Attn pip install packaging pip install flash-attn --no-build-isolation # Clone the inference Repo git clone https://github.com/adarshxs/Capx-Llama3.1-Vision cd Capx-Llama3.1-Vision pip install -e . ``` - Run cli server: ```bash python -m bunny.serve.cli \ --model-path Capx/Llama-3.1-Vision \ --model-type llama3.1-8b \ --image-file /path/to/image \ --conv-mode llama ``` We thank the amazing team at BAAI, for their Bunny project, upon which this was built and Meta AI for their Llama 3.1 model!