Trelis
/

Llama-2-7b-chat-hf-function-calling

Text Generation

function calling

text-generation-inference

Model card Files Files and versions Community

RonanMcGovern commited on Aug 11, 2023

Commit

9f9b784

•

1 Parent(s): 55cded1

add improved inference options

Files changed (1) hide show

README.md +16 -2

README.md CHANGED Viewed

@@ -26,16 +26,30 @@ Available models:
 ## Inference with Google Colab and HuggingFace 🤗
-Get started by saving your own copy of this [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing).
 You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
 ## Licensing and Usage
 fLlama-7B:
 - Llama 2 license
 fLlama-13B:
 - Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
 - Licenses are not transferable to other users/entities.

 ## Inference with Google Colab and HuggingFace 🤗
+**GPTQ (fastest + good accuracy)**
+Get started by saving your own copy of this [function calling chatbot](https://colab.research.google.com/drive/1u8x41Jx8WWtI-nzHOgqTxkS3Q_lcjaSX?usp=sharing).
 You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
+**Bits and Bytes NF4 (slowest inference)**
+Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
+**GGML (best for running on a laptop, great for Mac)**
+To run this you'll need to install llamaccp from ggerganov on github.
+- Download the ggml file from the ggml link above, under available models
+- I recommend running a command like:
+```
+  ./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
+  ```
+  which will allow you to run a chatbot in your browser. The -ngl offloads layers to the Mac's GPU and gets very good token generation speed.
 ## Licensing and Usage
 fLlama-7B:
 - Llama 2 license
 fLlama-13B:
+- For higher precision on function calling.
 - Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
 - Licenses are not transferable to other users/entities.