8bit-coder commited on
Commit
5a22206
β€’
1 Parent(s): 43b7475

First upload

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +74 -3
  3. alpaca-megaset-fixed.json +3 -0
  4. modelFace.jpg +0 -0
  5. settings.png +0 -0
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ alpaca-megaset-fixed.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,74 @@
1
- ---
2
- license: wtfpl
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ο»Ώ<p align="center"><img src="https://media.discordapp.net/attachments/1089718753186549960/1090451672121233518/00278-279547758.png" height=256></p>
2
+ <h1 align="center">
3
+ Alpaca 7B Native Enhanced
4
+ </h1>
5
+ <p align="center">The Most Advanced Alpaca 7B Model (Model Files Pending)</p>
6
+
7
+ ## πŸ“ƒ Model Facts
8
+ - Trained natively on 8x Nvidia A100 40GB GPUs; no LoRA used
9
+ - Trained on the largest & most accurate dataset yet
10
+ - Enhanced Programming Capabilities
11
+ - First Alpaca model to have conversational awareness
12
+
13
+ ## πŸš€ Quick Start Guide
14
+ Step 1. Make sure git-lfs is installed and ready to use ([Guide](https://git-lfs.com/))
15
+
16
+ Step 2. Download and install [text-generation-webui](https://github.com/oobabooga/text-generation-webui) according to the repository's instructions
17
+
18
+ Step 3. Navigate over to one of it's model folders and clone this repository:
19
+
20
+ git clone https://huggingface.co/8bit-coder/alpaca-7b-nativeEnhanced
21
+
22
+ Step 4. Launch the webui and replace the default instruction prompt with:
23
+
24
+ > You are an AI language model designed to assist the User by answering their questions, offering advice, and engaging in casual conversation in a friendly, helpful, and informative manner. You respond clearly, coherently, and you consideration of the conversation history.
25
+ User: Hey, how's it going?
26
+ Assistant: Hey there! I'm doing great, thank you. What can I help you with today? Let's have a fun chat!
27
+
28
+ Step 5. Change the settings to match this screenshot:
29
+ ![enter image description here](https://media.discordapp.net/attachments/1089718753186549960/1090428983595782194/image.png)
30
+
31
+ ## πŸ“š Training
32
+ #### We used 8x Nvidia A100 40GB GPUs for training this model. Training time took ~3 hours and resulting loss was 0.4761 over 3 epochs. The command used for training is as follows:
33
+
34
+ > **torchrun --nproc_per_node=8 --master_port=3045 ./stanford_alpaca/train.py --model_name_or_path ./llama-7b-hf --data_path ./alpaca-7b-nativeEnhanced/training_files/alpaca-megaset-fixed.json --fp16 True --output_dir ./output_7b --num_train_epochs 3 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_steps 16 --evaluation_strategy "no" --save_strategy "steps" --save_steps 200 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --fsdp "full_shard auto_wrap" --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' --tf32 True**
35
+
36
+ There's a folder in this repository called training_files. **full-training-instructions.txt** is the full list of commands from start to finish of training, to converting the model all the way to 4 bit quantized ggml. **It is not recommended to quantize this model down to 4 bits. The instructions are included purely for informational purposes.**
37
+
38
+ In addition, the training instructions file is built specifically for rented cloud computing. This means that by following the commands in the file, anyone should be able to train a similar model.
39
+
40
+ ### Common errors while training:
41
+ - CUDA Out of Memory error
42
+ - This is because your GPUs do not have a minimum of 40GB of vram. The weakest GPU that we've been able to successfully train on has been Nvidia A100 40GB. Even with 8 of these, the vram usage was almost always right up at the limit. If you have 40GB GPUs and are still running into this error, try halving the **per_device_train_batch_size** and **per_device_eval_batch_size** and doubling the **gradient_accumulation_steps**. If you have more than 40GB of vram per GPU and wish to train faster, the opposite applies.
43
+
44
+ - LLaMATokenizer error
45
+ - This happens because you forgot to fix tokenizer_config.json in the llama-7b-hf directory. The fix is to rename **LLaMATokenizer** to **LlamaTokenizer** in that file.
46
+
47
+ - RuntimeError: CUDA error: invalid device ordinal
48
+ - This error occurs when your **nproc_per_node** is set to a number greater than how many GPUs you have installed in your system. You can check how many GPUs you have installed by running **nvidia-smi**.
49
+
50
+ - torchrun is not recognized
51
+ - This error occurs when you have a python version older than 3.10. Follow the instructions in the training instructions file to install miniconda and get python 3.10 set up. Circumventing this error by running python -m torch.distributed.run will **not work**. Many of the dependencies require python 3.10 and will fatally error out at the start of training.
52
+
53
+ - KeyError
54
+ - This happens when your JSON training data is broken in some way. Try running the dataset_validator.py in the training_files folder to find the broken key.
55
+
56
+ ## πŸ“ Notes
57
+ - The main version of this model is in the hugging face transformers data type. The other one (.pth) format is provided **purely for experimental use with llama.cpp** and is not guaranteed to have conversational awareness.
58
+ - This model exhibits weird behavior when quantized to 4 bits. This might be due to the complexity of the model. We recommend the smallest quantization to be 8 bits, but this is untested.
59
+ - This model is slightly **underfitted**. We observed that training the model with a smaller gradient accumulation size benefitted the response quality.
60
+
61
+ - This model appears to have full conversational awareness. This means that provided you're running the model in the same configuration we detailed in the Quick Start Guide, you should be able to hold very detailed conversation with the AI without issues. There is a limit to it's memory, and it's 2048 tokens. Beyond that, it'll forget details and will need to be reminded.
62
+
63
+ ## πŸ”§ Dataset
64
+ The dataset used for training this model is made from [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned) and [codealpaca](https://github.com/sahil280114/codealpaca). We combined these datasets for the following reasons:
65
+
66
+ 1. Increased accuracy since the original stanford_alpaca dataset had many errors.
67
+ 2. Better knowledge in programming
68
+ 3. More training data
69
+
70
+ We had an issue with the latest AlpacaDataCleaned dataset where at around 90k lines in, one of the keys has a typo. The key is "instruction:" instead of "instruction". We have fixed this error in the provided megaset but if you plan on grabbing directly from AlpacaDataCleaned, make sure to fix this error. Otherwise, the training script will fail due to a KeyError.
71
+
72
+ ## πŸ‘¨β€πŸ’» Credits
73
+
74
+ Credits go to [Meta](https://github.com/facebookresearch/llama) for creating the foundational LLaMA models and [Stanford](https://github.com/tatsu-lab/stanford_alpaca) for the instructions on how to train. For the dataset, credits go to [AlpacaDataCleaned](https://github.com/gururise/AlpacaDataCleaned) and [codealpaca](https://github.com/sahil280114/codealpaca). Credits also go to [chavinlo](https://huggingface.co/chavinlo/alpaca-native) for creating the original Alpaca 7B Native model, the inspiration behind this model.
alpaca-megaset-fixed.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd16fa0cb1e2402ab5839ec2231ceacf8062070cd750b50b879e74cb16603d3e
3
+ size 30418704
modelFace.jpg ADDED
settings.png ADDED