RonanMcGovern commited on
Commit
9f9b784
1 Parent(s): 55cded1

add improved inference options

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -26,16 +26,30 @@ Available models:
26
 
27
  ## Inference with Google Colab and HuggingFace 🤗
28
 
29
- Get started by saving your own copy of this [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing).
30
-
31
  You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## Licensing and Usage
34
 
35
  fLlama-7B:
36
  - Llama 2 license
37
 
38
  fLlama-13B:
 
39
  - Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
40
 
41
  - Licenses are not transferable to other users/entities.
 
26
 
27
  ## Inference with Google Colab and HuggingFace 🤗
28
 
29
+ **GPTQ (fastest + good accuracy)**
30
+ Get started by saving your own copy of this [function calling chatbot](https://colab.research.google.com/drive/1u8x41Jx8WWtI-nzHOgqTxkS3Q_lcjaSX?usp=sharing).
31
  You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
32
 
33
+ **Bits and Bytes NF4 (slowest inference)**
34
+ Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
35
+
36
+ **GGML (best for running on a laptop, great for Mac)**
37
+ To run this you'll need to install llamaccp from ggerganov on github.
38
+ - Download the ggml file from the ggml link above, under available models
39
+ - I recommend running a command like:
40
+
41
+ ```
42
+ ./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
43
+ ```
44
+ which will allow you to run a chatbot in your browser. The -ngl offloads layers to the Mac's GPU and gets very good token generation speed.
45
+
46
  ## Licensing and Usage
47
 
48
  fLlama-7B:
49
  - Llama 2 license
50
 
51
  fLlama-13B:
52
+ - For higher precision on function calling.
53
  - Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
54
 
55
  - Licenses are not transferable to other users/entities.