Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,13 @@
|
|
1 |
---
|
2 |
inference: false
|
3 |
license: other
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
<!-- header start -->
|
@@ -34,6 +41,26 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
34 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/fin-llama-33B-GGML)
|
35 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bavest/fin-llama-33b-merged)
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
<!-- compatibility_ggml start -->
|
38 |
## Compatibility
|
39 |
|
@@ -80,7 +107,6 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
80 |
| fin-llama-33b.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
|
81 |
| fin-llama-33b.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
82 |
|
83 |
-
|
84 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
85 |
|
86 |
## How to run in `llama.cpp`
|
@@ -88,7 +114,7 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
88 |
I use the following command line; adjust for your tastes and needs:
|
89 |
|
90 |
```
|
91 |
-
./main -t 10 -ngl 32 -m fin-llama-33b.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction:
|
92 |
```
|
93 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
94 |
|
|
|
1 |
---
|
2 |
inference: false
|
3 |
license: other
|
4 |
+
datasets:
|
5 |
+
- bavest/fin-llama-dataset
|
6 |
+
tags:
|
7 |
+
- finance
|
8 |
+
- llm
|
9 |
+
- llama
|
10 |
+
- trading
|
11 |
---
|
12 |
|
13 |
<!-- header start -->
|
|
|
41 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/fin-llama-33B-GGML)
|
42 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/bavest/fin-llama-33b-merged)
|
43 |
|
44 |
+
## Prompt template
|
45 |
+
|
46 |
+
Standard Alpaca, meaning:
|
47 |
+
|
48 |
+
```
|
49 |
+
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's question.
|
50 |
+
### Instruction: prompt
|
51 |
+
|
52 |
+
### Response:
|
53 |
+
```
|
54 |
+
or
|
55 |
+
```
|
56 |
+
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's question.
|
57 |
+
### Instruction: prompt
|
58 |
+
|
59 |
+
### Input:
|
60 |
+
|
61 |
+
### Response:
|
62 |
+
```
|
63 |
+
|
64 |
<!-- compatibility_ggml start -->
|
65 |
## Compatibility
|
66 |
|
|
|
107 |
| fin-llama-33b.ggmlv3.q6_K.bin | q6_K | 6 | 26.69 GB | 29.19 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
|
108 |
| fin-llama-33b.ggmlv3.q8_0.bin | q8_0 | 8 | 34.56 GB | 37.06 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
109 |
|
|
|
110 |
**Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
111 |
|
112 |
## How to run in `llama.cpp`
|
|
|
114 |
I use the following command line; adjust for your tastes and needs:
|
115 |
|
116 |
```
|
117 |
+
./main -t 10 -ngl 32 -m fin-llama-33b.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: I want you to act as an accountant and come up with creative ways to manage finances. You'll need to consider budgeting, investment strategies and risk management when creating a financial plan for your client. In some cases, you may also need to provide advice on taxation laws and regulations in order to help them maximize their profits. My first suggestion request is “Create a financial plan for a small business that focuses on cost savings and long-term investments".\n### Response:"
|
118 |
```
|
119 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
120 |
|