|
--- |
|
license: other |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- 7B |
|
- Saily |
|
- DEEPNIGHT |
|
- Llama |
|
- Llama2 |
|
--- |
|
|
|
# SaiLy 7B (deepnight-research/saily-7b-v0) |
|
<img src="https://i.ibb.co/TvZQjZM/Leonardo-Diffusion-XL-Furious-and-strong-Elephant-and-anchor-l-1.jpg" alt="Saily: Experimental AI Models by DEEPNIGHT"> |
|
|
|
--- |
|
### SaiLy is a series/collection of AI Models by DEEPNIGHT-RESEARCH which are highly experimental and uncensored. Please use with responsibility. |
|
--- |
|
<br> |
|
Prompt Template: Alpeca |
|
|
|
``` |
|
Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
### Instruction: |
|
{prompt} |
|
### Response: |
|
``` |
|
|
|
### Description |
|
This is the first model of the series. The model is based on Llama2-chat. |
|
|
|
--- |
|
### Did some said CODE? |
|
Here you go! |
|
```python |
|
import transformers |
|
model = transformers.AutoModelForCausalLM.from_pretrained( |
|
'deepnight-research/saily-7b-v0' |
|
) |
|
``` |
|
|
|
To use the optimized triton implementation of FlashAttention, you can load the model on GPU ```(cuda:0)``` with ```attn_impl='triton'``` and with ```bfloat16``` precision: |
|
```python |
|
import torch |
|
import transformers |
|
|
|
name = 'deepnight-research/saily-7b-v0' |
|
|
|
config = transformers.AutoConfig.from_pretrained(name) |
|
config.attn_config['attn_impl'] = 'triton' |
|
config.init_device = 'cuda:0' # For fast initialization directly on GPU! |
|
|
|
model = transformers.AutoModelForCausalLM.from_pretrained( |
|
name, |
|
config=config, |
|
torch_dtype=torch.bfloat16, # Load model weights in bfloat16 |
|
trust_remote_code=True |
|
) |
|
|
|
``` |
|
--- |
|
|
|
If you would like to support us, please [consider donating](https://donate.deepnight.tech) for [#aiforcause](https://github.com/deepnight-ai/aiforcause). |
|
|
|
Cheers✌️ |
|
- Team [DEEPNIGHT](https://deepnight.tech) |