|
--- |
|
license: llama2 |
|
datasets: |
|
- ehartford/wizard_vicuna_70k_unfiltered |
|
tags: |
|
- uncensored |
|
- wizard |
|
- vicuna |
|
- llama |
|
--- |
|
This is an fp16 copy of [jarradh/llama2_70b_chat_uncensored](https://huggingface.co/jarradh/llama2_70b_chat_uncensored) for faster downloading and less disk space usage than the fp32 original. I simply imported the model to CPU with torch_dtype=torch.float16 and then exported it again. I also added a chat_template entry derived from the model card to the tokenizer_config.json file, which previously didn't have one. All credit for the model goes to [jarradh](https://huggingface.co/jarradh). |
|
|
|
Arguable a better name for this model would be something like Llama-2-70B_Wizard-Vicuna-Uncensored-fp16, but to avoid confusion I'm sticking with jarradh's naming scheme. |
|
|
|
<!-- repositories-available start --> |
|
## Repositories available |
|
|
|
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GPTQ) |
|
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/llama2_70b_chat_uncensored-GGML) |
|
* [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference, plus fp16 GGUF for requantizing](https://huggingface.co/TheBloke/YokaiKoibito/WizardLM-Uncensored-Falcon-40B-GGUF) |
|
* [Jarrad Hope's unquantised model in fp16 pytorch format, for GPU inference and further conversions](https://huggingface.co/YokaiKoibito/llama2_70b_chat_uncensored-fp16) |
|
* [Jarrad Hope's original unquantised fp32 model in pytorch format, for further conversions](https://huggingface.co/jarradh/llama2_70b_chat_uncensored) |
|
|
|
<!-- repositories-available end --> |
|
|
|
## Prompt template: Human-Response |
|
|
|
``` |
|
### HUMAN: |
|
{prompt} |
|
|
|
### RESPONSE: |
|
``` |
|
|