TheBloke commited on
Commit
cdec054
1 Parent(s): d5558de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
  license: other
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ language:
4
+ - en
5
+ pipeline_tag: text2text-generation
6
+ tags:
7
+ - alpaca
8
+ - llama
9
+ - chat
10
+ - gpt4
11
+ inference: false
12
  ---
13
+ # GPT4 Alpaca LoRA 30B - 4bit GGML
14
+
15
+ This is a 4-bit GGML version of the [Chansung GPT4 Alpaca 30B LoRA model](https://huggingface.co/chansung/gpt4-alpaca-lora-30b).
16
+
17
+ It was created by merging the LoRA provided in the above repo with the original Llama 30B model, producing unquantised model [GPT4-Alpaca-LoRA-30B-HF](https://huggingface.co/TheBloke/gpt4-alpaca-lora-30b-HF)
18
+
19
+ The files in this repo were then quantized to 4bit for use with [llama.cpp](https://github.com/ggerganov/llama.cpp) using the new 4bit quantisation methods being worked on in [PR #896](https://github.com/ggerganov/llama.cpp/pull/896).
20
+
21
+ ## Provided files
22
+
23
+ Two files are provided. One is quantised using method Q4_0, the other in Q4_1.
24
+
25
+ The Q4_1 file requires more RAM and may run a little slower. It may give slightly better results, but this is not proven.
26
+
27
+ ## How to run in `llama.cpp`
28
+
29
+ I use the following command line; adjust for your tastes and needs:
30
+
31
+ ```
32
+ ./main -t 18 -m gpt4-alpaca-lora-30B.GGML.q4_1.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
33
+ ### Instruction:
34
+ Write a story about llamas
35
+ ### Response:"
36
+ ```
37
+ Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
38
+
39
+ If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
40
+
41
+ # Original GPT4 Alpaca Lora model card
42
+
43
+ This repository comes with LoRA checkpoint to make LLaMA into a chatbot like language model. The checkpoint is the output of instruction following fine-tuning process with the following settings on 8xA100(40G) DGX system.
44
+ - Training script: borrowed from the official [Alpaca-LoRA](https://github.com/tloen/alpaca-lora) implementation
45
+ - Training script:
46
+ ```shell
47
+ python finetune.py \
48
+ --base_model='decapoda-research/llama-30b-hf' \
49
+ --data_path='alpaca_data_gpt4.json' \
50
+ --num_epochs=10 \
51
+ --cutoff_len=512 \
52
+ --group_by_length \
53
+ --output_dir='./gpt4-alpaca-lora-30b' \
54
+ --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
55
+ --lora_r=16 \
56
+ --batch_size=... \
57
+ --micro_batch_size=...
58
+ ```
59
+
60
+ You can find how the training went from W&B report [here](https://wandb.ai/chansung18/gpt4_alpaca_lora/runs/w3syd157?workspace=user-chansung18).