Upload README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ model_creator: OpenChat
|
|
19 |
model_name: Openchat 3.5 1210
|
20 |
model_type: mistral
|
21 |
pipeline_tag: text-generation
|
22 |
-
prompt_template: 'GPT4 User: {prompt}<|end_of_turn|>GPT4 Assistant:
|
23 |
|
24 |
'
|
25 |
quantized_by: TheBloke
|
@@ -88,10 +88,10 @@ Here is an incomplete list of clients and libraries that are known to support GG
|
|
88 |
<!-- repositories-available end -->
|
89 |
|
90 |
<!-- prompt-template start -->
|
91 |
-
## Prompt template: OpenChat
|
92 |
|
93 |
```
|
94 |
-
GPT4 User: {prompt}<|end_of_turn|>GPT4 Assistant:
|
95 |
|
96 |
```
|
97 |
|
@@ -210,7 +210,7 @@ Windows Command Line users: You can set the environment variable by running `set
|
|
210 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
211 |
|
212 |
```shell
|
213 |
-
./main -ngl 35 -m openchat-3.5-1210.Q4_K_M.gguf --color -c 8192 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "GPT4 User: {prompt}<|end_of_turn|>GPT4 Assistant:"
|
214 |
```
|
215 |
|
216 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
@@ -271,7 +271,7 @@ llm = Llama(
|
|
271 |
|
272 |
# Simple inference example
|
273 |
output = llm(
|
274 |
-
"GPT4 User: {prompt}<|end_of_turn|>GPT4 Assistant:", # Prompt
|
275 |
max_tokens=512, # Generate up to 512 tokens
|
276 |
stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
|
277 |
echo=True # Whether to echo the prompt
|
|
|
19 |
model_name: Openchat 3.5 1210
|
20 |
model_type: mistral
|
21 |
pipeline_tag: text-generation
|
22 |
+
prompt_template: 'GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:
|
23 |
|
24 |
'
|
25 |
quantized_by: TheBloke
|
|
|
88 |
<!-- repositories-available end -->
|
89 |
|
90 |
<!-- prompt-template start -->
|
91 |
+
## Prompt template: OpenChat-Correct
|
92 |
|
93 |
```
|
94 |
+
GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:
|
95 |
|
96 |
```
|
97 |
|
|
|
210 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
211 |
|
212 |
```shell
|
213 |
+
./main -ngl 35 -m openchat-3.5-1210.Q4_K_M.gguf --color -c 8192 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:"
|
214 |
```
|
215 |
|
216 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
271 |
|
272 |
# Simple inference example
|
273 |
output = llm(
|
274 |
+
"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:", # Prompt
|
275 |
max_tokens=512, # Generate up to 512 tokens
|
276 |
stop=["</s>"], # Example stop token - not necessarily correct for this specific model! Please check before using.
|
277 |
echo=True # Whether to echo the prompt
|