Saxo
/

Linkbricks-Horizon-AI-Korean-llama3-sft-dpo-8b-base

@@ -1,47 +1,64 @@
----
-base_model:
-- meta-llama/Meta-Llama-3-8B-Instruct
-- MLP-KTLim/llama-3-Korean-Bllossom-8B
-library_name: transformers
-tags:
-- mergekit
-- merge
----
-# Linkbricks-Horizon-AI-Ko-Instruct-8B-base
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the SLERP merge method.
-### Models Merged
-The following models were included in the merge:
-* [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
-* [MLP-KTLim/llama-3-Korean-Bllossom-8B](https://huggingface.co/MLP-KTLim/llama-3-Korean-Bllossom-8B)
-### Configuration
-The following YAML configuration was used to produce this model:
-```yaml
-base_model: MLP-KTLim/llama-3-Korean-Bllossom-8B
-dtype: bfloat16
-merge_method: slerp
-parameters:
-  t:
-  - filter: self_attn
-    value: [0.0, 0.5, 0.3, 0.7, 1.0]
-  - filter: mlp
-    value: [1.0, 0.5, 0.7, 0.3, 0.0]
-  - value: 0.7
-slices:
-- sources:
-  - layer_range: [0, 32]
-    model: MLP-KTLim/llama-3-Korean-Bllossom-8B
-  - layer_range: [0, 32]
-    model: meta-llama/Meta-Llama-3-8B-Instruct
-```

+---
+library_name: transformers
+license: apache-2.0
+basemodel: meta-llama/Meta-Llama-3-8B-Instruct
+datasets:
+- Saxo/total_ko_train_set_1_with_wiki_with_orca
+language:
+- ko
+- en
+pipeline_tag: text-generation
+---
+# Model Card for Model ID
+<div align="center">
+<img src="https://www.linkbricks.com/wp-content/uploads/2022/03/%E1%84%85%E1%85%B5%E1%86%BC%E1%84%8F%E1%85%B3%E1%84%87%E1%85%B3%E1%84%85%E1%85%B5%E1%86%A8%E1%84%89%E1%85%B3%E1%84%85%E1%85%A9%E1%84%80%E1%85%A9-2-1024x804.png" />
+</div>
+AI 와 빅데이터 분석 전문 기업인 Linkbricks의 데이터사이언티스트인 지윤성 박사(Saxo)가 meta-llama/Meta-Llama-3-8B를 베이스모델로 GCP상의 H100-60G 8개를 통해 SFT-DPO 훈련을 한(8000 Tokens) 모델.
+ Accelerate, Deepspeed Zero-3 라이브러리를 사용했으며 Flash Attention 은 Disable  로 설정
+Dr. Yunsung Ji (Saxo), a data scientist at Linkbricks, a company specializing in AI and big data analytics, trained the meta-llama/Meta-Llama-3-8B base model on 8 H100-60Gs on GCP for 4 hours of instructional training (8000 Tokens).
+Accelerate, Deepspeed Zero-3 libraries were used.
+www.linkbricks.com, www.linkbricks.vc
+## Configuration including BitsandBytes
+---
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=False,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch_dtype
+)
+args = TrainingArguments(
+    output_dir=project_name,
+    run_name=run_name_str,
+    overwrite_output_dir=True,
+    num_train_epochs=20,
+    per_device_train_batch_size=1,
+    gradient_accumulation_steps=4, #1
+    gradient_checkpointing=True,
+    optim="paged_adamw_32bit",
+    #optim="adamw_8bit",
+    logging_steps=10,
+    save_steps=100,
+    save_strategy="epoch",
+    learning_rate=2e-4, #2e-4
+    weight_decay=0.01,
+    max_grad_norm=1, #0.3
+    max_steps=-1,
+    warmup_ratio=0.1,
+    group_by_length=False,
+    fp16 = not torch.cuda.is_bf16_supported(),
+    bf16 = torch.cuda.is_bf16_supported(),
+    #fp16 = True,
+    lr_scheduler_type="cosine", #"constant",
+    disable_tqdm=False,
+    report_to='wandb',
+    push_to_hub=False
+)