|
--- |
|
inference: false |
|
license: mit |
|
base_model: microsoft/phi-2 |
|
tags: |
|
- axolotl |
|
- generated_from_trainer |
|
model-index: |
|
- name: Phasmid-2_v2 |
|
results: [] |
|
datasets: |
|
- PygmalionAI/PIPPA |
|
- HuggingFaceH4/no_robots |
|
--- |
|
|
|
|
|
``` |
|
_ (`-. ('-. .-. ('-. .-') _ .-') _ .-') _ |
|
( (OO )( OO ) / ( OO ).-. ( OO ).( '.( OO )_ ( ( OO) ) |
|
_.` \,--. ,--. / . --. /(_)---\_),--. ,--.) ,-.-') \ .'_ |
|
(__...--''| | | | | \-. \ / _ | | `.' | | |OO),`'--..._) |
|
| / | || .| |.-'-' | |\ :` `. | | | | \| | \ ' |
|
| |_.' || | \| |_.' | '..`''.)| |'.'| | | |(_/| | ' | |
|
| .___.'| .-. | | .-. |.-._) \| | | | ,| |_.'| | / : |
|
| | | | | | | | | |\ /| | | |(_| | | '--' / |
|
`--' `--' `--' `--' `--' `-----' `--' `--' `--' `-------' |
|
``` |
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.3.0` |
|
```yaml |
|
base_model: microsoft/phi-2 |
|
model_type: PhiForCausalLM |
|
tokenizer_type: AutoTokenizer |
|
is_llama_derived_model: false |
|
trust_remote_code: true |
|
|
|
load_in_8bit: false |
|
load_in_4bit: false |
|
strict: false |
|
|
|
datasets: |
|
- path: SE6446/SE6446_phasmid_ds |
|
type: completion |
|
|
|
hub_model_id: SE6446/Phasmid-2_v2 |
|
hub_strategy: every_save |
|
use_auth_token: true |
|
dataset_prepared_path: /phasmid-2-ds-path |
|
val_set_size: 0.05 |
|
output_dir: ./phasmid-sft-out |
|
|
|
sequence_len: 2048 |
|
sample_packing: true |
|
pad_to_sequence_len: |
|
|
|
adapter: |
|
lora_model_dir: |
|
lora_r: |
|
lora_alpha: |
|
lora_dropout: |
|
lora_target_linear: |
|
lora_fan_in_fan_out: |
|
|
|
wandb_project: |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: |
|
wandb_log_model: |
|
|
|
gradient_accumulation_steps: 1 |
|
micro_batch_size: 1 |
|
num_epochs: 4 |
|
optimizer: adamw_torch |
|
adam_beta2: 0.95 |
|
adam_epsilon: 0.00001 |
|
max_grad_norm: 1.0 |
|
lr_scheduler: cosine |
|
learning_rate: 0.0003 |
|
|
|
train_on_inputs: false |
|
group_by_length: true |
|
bf16: true |
|
fp16: false |
|
tf32: true |
|
|
|
gradient_checkpointing: |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: |
|
|
|
warmup_steps: 100 |
|
evals_per_epoch: 4 |
|
saves_per_epoch: 1 |
|
debug: |
|
deepspeed: |
|
weight_decay: 0.1 |
|
fsdp: |
|
fsdp_config: |
|
resize_token_embeddings_to_32x: true |
|
special_tokens: |
|
bos_token: "<|endoftext|>" |
|
eos_token: "<|endoftext|>" |
|
unk_token: "<|endoftext|>" |
|
pad_token: "<|endoftext|>" |
|
|
|
``` |
|
|
|
</details><br> |
|
|
|
|
|
# Phasmid-2_v2 |
|
|
|
This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on a mix of no_robots and the PIPPA dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 2.2924 |
|
|
|
## Model description |
|
Phasmid-2 has been trained on intructional data and thus can perform far better at instruction following than phi-2. However I have not extensively tested the model. |
|
## Intended uses & limitations |
|
This model is little more than a side project and I shall treat it as such. |
|
Phasmid-2 (due to it's size), can still suffer from problematic hallucinations and poor information. No effort was made to reduce potentially toxic responses, as such you should train this model further if you require it to do so. |
|
## Inference |
|
Ensure that eniops is installed |
|
``` |
|
pip install einops |
|
``` |
|
|
|
Phi doesn't like device_map = auto, therefore you should specify as like the following: |
|
|
|
1. FP16 / Flash-Attention / CUDA: |
|
```python |
|
model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype="auto", flash_attn=True, flash_rotary=True, fused_dense=True, device_map="cuda", trust_remote_code=True) |
|
``` |
|
2. FP16 / CUDA: |
|
```python |
|
model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype="auto", device_map="cuda", trust_remote_code=True) |
|
``` |
|
3. FP32 / CUDA: |
|
```python |
|
model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True) |
|
``` |
|
4. FP32 / CPU: |
|
```python |
|
model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True) |
|
``` |
|
|
|
And then use the following snippet |
|
```python |
|
tokenizer = AutoTokenizer.from_pretrained("SE6446/Phasmid-2_v2", trust_remote_code=True, torch_dtype="auto") |
|
inputs = tokenizer('''SYSTEM: You are a helpful assistant. Please answer truthfully and politely. {custom_prompt}\n |
|
USER: {{userinput}}\n |
|
ASSISTANT: {{character name if applicable}}:''', return_tensors="pt", return_attention_mask=False) |
|
outputs = model.generate(**inputs, max_length=200) |
|
text = tokenizer.batch_decode(outputs)[0] |
|
print(text) |
|
``` |
|
it should generate after "ASSISTANT:". |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0003 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 100 |
|
- num_epochs: 4 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
| 2.3313 | 0.0 | 1 | 2.1374 | |
|
| 2.5755 | 0.25 | 1319 | 2.5281 | |
|
| 2.4864 | 0.5 | 2638 | 2.5314 | |
|
| 2.0961 | 0.75 | 3957 | 2.4697 | |
|
| 2.6547 | 1.0 | 5276 | 2.4213 | |
|
| 2.1235 | 1.24 | 6595 | 2.3926 | |
|
| 1.8875 | 1.49 | 7914 | 2.3233 | |
|
| 0.9059 | 1.74 | 9233 | 2.2590 | |
|
| 2.2046 | 1.99 | 10552 | 2.1985 | |
|
| 1.1938 | 2.23 | 11871 | 2.2555 | |
|
| 1.1425 | 2.48 | 13190 | 2.2393 | |
|
| 0.6688 | 2.73 | 14509 | 2.2237 | |
|
| 1.1111 | 2.98 | 15828 | 2.2126 | |
|
| 0.651 | 3.21 | 17147 | 2.2859 | |
|
| 0.8669 | 3.46 | 18466 | 2.2914 | |
|
| 0.4149 | 3.71 | 19785 | 2.2924 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.37.0.dev0 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |