RonanMcGovern
commited on
Commit
•
9f9b784
1
Parent(s):
55cded1
add improved inference options
Browse files
README.md
CHANGED
@@ -26,16 +26,30 @@ Available models:
|
|
26 |
|
27 |
## Inference with Google Colab and HuggingFace 🤗
|
28 |
|
29 |
-
|
30 |
-
|
31 |
You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## Licensing and Usage
|
34 |
|
35 |
fLlama-7B:
|
36 |
- Llama 2 license
|
37 |
|
38 |
fLlama-13B:
|
|
|
39 |
- Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
|
40 |
|
41 |
- Licenses are not transferable to other users/entities.
|
|
|
26 |
|
27 |
## Inference with Google Colab and HuggingFace 🤗
|
28 |
|
29 |
+
**GPTQ (fastest + good accuracy)**
|
30 |
+
Get started by saving your own copy of this [function calling chatbot](https://colab.research.google.com/drive/1u8x41Jx8WWtI-nzHOgqTxkS3Q_lcjaSX?usp=sharing).
|
31 |
You will be able to run inference using a free Colab notebook if you select a gpu runtime. See the notebook for more details.
|
32 |
|
33 |
+
**Bits and Bytes NF4 (slowest inference)**
|
34 |
+
Try out this notebook [fLlama_Inference notebook](https://colab.research.google.com/drive/1Ow5cQ0JNv-vXsT-apCceH6Na3b4L7JyW?usp=sharing)
|
35 |
+
|
36 |
+
**GGML (best for running on a laptop, great for Mac)**
|
37 |
+
To run this you'll need to install llamaccp from ggerganov on github.
|
38 |
+
- Download the ggml file from the ggml link above, under available models
|
39 |
+
- I recommend running a command like:
|
40 |
+
|
41 |
+
```
|
42 |
+
./server -m fLlama-2-7b-chat.ggmlv3.q3_K_M.bin -ngl 32 -c 2048
|
43 |
+
```
|
44 |
+
which will allow you to run a chatbot in your browser. The -ngl offloads layers to the Mac's GPU and gets very good token generation speed.
|
45 |
+
|
46 |
## Licensing and Usage
|
47 |
|
48 |
fLlama-7B:
|
49 |
- Llama 2 license
|
50 |
|
51 |
fLlama-13B:
|
52 |
+
- For higher precision on function calling.
|
53 |
- Purchase acess here: [fLlama-13b: €19.99 per user/seat.](https://buy.stripe.com/9AQ7te3lHdmbdZ68wz)
|
54 |
|
55 |
- Licenses are not transferable to other users/entities.
|