Jungwonchang
commited on
Commit
•
680febb
1
Parent(s):
c67c06c
Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ language:
|
|
10 |
---
|
11 |
|
12 |
# Model Card for Model ID
|
13 |
-
Korean Chatbot based on Alibaba's QWEN
|
14 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6232fdee38869c4ca8fd49e2/CBQ0cdD54Sd7-rbNt-Mkb.png)
|
15 |
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fmcq1YZaIYg-cuCS4aadomutLmzSyEYI#scrollTo=6c1edcdc-158d-4043-a7c7-1d145ebf2cd1)
|
16 |
(keep in mind that basic colab runtime with T4 GPU will lead to OOM error. Fine-tuned version of Qwen-14b-Chat-Int4 will not have this issue)
|
@@ -190,21 +190,10 @@ response = qwen_chat_single_turn(model, tokenizer, device, query=query,
|
|
190 |
### Training Procedure
|
191 |
|
192 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
|
|
|
|
|
|
193 |
|
194 |
-
#### Preprocessing [optional]
|
195 |
-
|
196 |
-
[More Information Needed]
|
197 |
-
|
198 |
-
|
199 |
-
#### Training Hyperparameters
|
200 |
-
|
201 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
202 |
-
|
203 |
-
#### Speeds, Sizes, Times [optional]
|
204 |
-
|
205 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
206 |
-
|
207 |
-
[More Information Needed]
|
208 |
|
209 |
## Evaluation
|
210 |
|
@@ -296,8 +285,8 @@ Jungwon Chang
|
|
296 |
|
297 |
## Model Card Contact
|
298 |
|
299 |
-
|
300 |
-
|
301 |
|
302 |
## Training procedure
|
303 |
|
|
|
10 |
---
|
11 |
|
12 |
# Model Card for Model ID
|
13 |
+
Korean Chatbot based on Alibaba's [QWEN](https://github.com/QwenLM/Qwen)
|
14 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6232fdee38869c4ca8fd49e2/CBQ0cdD54Sd7-rbNt-Mkb.png)
|
15 |
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1fmcq1YZaIYg-cuCS4aadomutLmzSyEYI#scrollTo=6c1edcdc-158d-4043-a7c7-1d145ebf2cd1)
|
16 |
(keep in mind that basic colab runtime with T4 GPU will lead to OOM error. Fine-tuned version of Qwen-14b-Chat-Int4 will not have this issue)
|
|
|
190 |
### Training Procedure
|
191 |
|
192 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
193 |
+
The model was fine-tuned using LoRA (Low-Rank Adaptation), which allows for efficient training of large language models by updating only a small set of parameters.
|
194 |
+
The fine-tuning process was conducted on a single node with 2 GPUs, utilizing distributed training to enhance the training efficiency and speed.
|
195 |
+
The lora rank was set to 32, for I only had limited time to access the GPUs.
|
196 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
197 |
|
198 |
## Evaluation
|
199 |
|
|
|
285 |
|
286 |
## Model Card Contact
|
287 |
|
288 | |
289 | |
290 |
|
291 |
## Training procedure
|
292 |
|