jujbob commited on
Commit
17ecd35
โ€ข
1 Parent(s): 4aff21d
Files changed (1) hide show
  1. README.md +20 -21
README.md CHANGED
@@ -14,8 +14,22 @@ base_model:
14
 
15
  # Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) | [Colab-tutorial](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) |
16
 
17
- [Latest Update]
18
- - 2024.05.08 Vocab Expansion model update
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
21
 
@@ -50,28 +64,13 @@ The Bllossom language model is a Korean-English bilingual language model based o
50
 
51
 
52
  ## NEWS
53
- * [2024/04] We released Bllossom v2.0, based on llama-3
 
54
  * [2023/12] We released Bllossom-Vision v1.0, based on Bllossom
55
  * [2023/08] We released Bllossom v1.0, based on llama-2.
56
  * [2023/07] We released Bllossom v0.7, based on polyglot-ko.
57
 
58
 
59
- ```bash
60
- ์ €ํฌ ์„œ์šธ๊ณผ๊ธฐ๋Œ€ MLP์—ฐ๊ตฌ์‹ค์—์„œ ํ•œ๊ตญ์–ด-์˜์–ด ์ด์ค‘ ์–ธ์–ด๋ชจ๋ธ์ธ Bllossom์„ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค!
61
- - LLama3-8B ๊ธฐ๋ฐ˜์˜ ๊ฒฝ๋Ÿ‰ํ™”๋œ ์‚ฌ์ด์ฆˆ
62
- - ํ•œ๊ตญ์–ด-์˜์–ด ์ง€์‹์—ฐ๊ฒฐ์„ ํ†ตํ•œ ํ•œ๊ตญ์–ด ์ง€์‹ ๊ฐ•ํ™”
63
- - ํ•œ๊ตญ์–ด ์–ดํœ˜์ถ”๊ฐ€
64
- - ํ•œ๊ตญ์–ด ๋ฌธํ™”, ์–ธ์–ด๋ฅผ ๊ณ ๋ คํ•œ ์ž์ฒด์ œ์ž‘ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฏธ์„ธ์กฐ์ •
65
- - ๊ฐ•ํ™”ํ•™์Šต (DPO)
66
- - ์‹œ๊ฐ-์–ธ์–ด ๋ชจ๋ธํ™•์žฅ
67
-
68
- 1. Bllossom์€ ์„œ์šธ๊ณผ๊ธฐ๋Œ€, ํ…Œ๋””์ธ, ์—ฐ์„ธ๋Œ€ ์–ธ์–ด์ž์› ์—ฐ๊ตฌ์‹ค์˜ ์–ธ์–ดํ•™์ž์™€ ํ˜‘์—…ํ•ด ๋งŒ๋“  ์‹ค์šฉ์ฃผ์˜๊ธฐ๋ฐ˜ ์–ธ์–ด๋ชจ๋ธ์ž…๋‹ˆ๋‹ค! ์•ž์œผ๋กœ ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•ด ๊ด€๋ฆฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค ๋งŽ์ด ํ™œ์šฉํ•ด์ฃผ์„ธ์š” ๐Ÿ™‚
69
- 2. Bllossom70B๋ชจ๋ธ, ์–ดํœ˜ํ™•์žฅ๋ชจ๋ธ, ์‹œ๊ฐ-์–ธ์–ด๋ชจ๋ธ์€ ์ถ”ํ›„ ๊ณต๊ฐœํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค. (๊ถ๊ธˆํ•˜์‹ ๋ถ„์€ ๊ฐœ๋ณ„ ์—ฐ๋ฝ์ฃผ์„ธ์š”, GPU๋งŒ ์ง€์›ํ•ด์ฃผ์‹œ๋ฉด ๋ฌด๋ฃŒ๋กœ ๋“œ๋ฆฝ๋‹ˆ๋‹ค!)
70
- 3. Bllossom์€ NAACL2024, LREC-COLING2024 (๊ตฌ๋‘) ๋ฐœํ‘œ๋กœ ์ฑ„ํƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
71
- 4. ์ข‹์€ ์–ธ์–ด๋ชจ๋ธ ๊ณ„์† ์—…๋ฐ์ดํŠธ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!! ํ•œ๊ตญ์–ด ๊ฐ•ํ™”๋ฅผ์œ„ํ•ด ๊ณต๋™ ์—ฐ๊ตฌํ•˜์‹ค๋ถ„ ์–ธ์ œ๋“  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!!
72
- ```
73
-
74
-
75
  ## Example code
76
 
77
  ### Colab Tutorial
@@ -87,7 +86,7 @@ pip install torch transformers==4.40.0 accelerate
87
  import transformers
88
  import torch
89
 
90
- model_id = "MLP-KTLim/llama3-Bllossom"
91
 
92
  pipeline = transformers.pipeline(
93
  "text-generation",
@@ -140,7 +139,7 @@ import os
140
  import torch
141
  from transformers import AutoTokenizer, AutoModelForCausalLM
142
 
143
- model_id = 'MLP-KTLim/llama3-Bllossom'
144
 
145
  tokenizer = AutoTokenizer.from_pretrained(model_id)
146
  model = AutoModelForCausalLM.from_pretrained(
 
14
 
15
  # Bllossom | [Demo]() | [Homepage](https://www.bllossom.ai/) | [Github](https://github.com/MLP-Lab/Bllossom) | [Colab-tutorial](https://colab.research.google.com/drive/1fBOzUVZ6NRKk_ugeoTbAOokWKqSN47IG?usp=sharing) |
16
 
17
+
18
+ ```bash
19
+ ์ €ํฌ ์„œ์šธ๊ณผ๊ธฐ๋Œ€ MLP์—ฐ๊ตฌ์‹ค์—์„œ ํ•œ๊ตญ์–ด-์˜์–ด ์ด์ค‘ ์–ธ์–ด๋ชจ๋ธ์ธ Bllossom์„ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค! ์„œ์šธ๊ณผ๊ธฐ๋Œ€ ์Šˆํผ์ปดํ“จํŒ… ์„ผํ„ฐ์˜ ์ง€์›์œผ๋กœ 100GB๊ฐ€๋„˜๋Š” ํ•œ๊ตญ์–ด ์ถ”๊ฐ€ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ํ•œ๊ตญ์–ด ๊ฐ•ํ™” ์ด์ค‘์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค!
20
+ ํ•œ๊ตญ์–ด ์ž˜ํ•˜๋Š” ๋ชจ๋ธ ์ฐพ๊ณ  ์žˆ์ง€ ์•Š์œผ์…จ๋‚˜์š”?
21
+ - ๋ฌด๋ ค 3๋งŒ๊ฐœ๊ฐ€ ๋„˜๋Š” ํ•œ๊ตญ์–ด ์–ดํœ˜ํ™•์žฅ
22
+ - Llama3๋Œ€๋น„ ๋Œ€๋žต 25% ๋” ๊ธด ๊ธธ์ด์˜ ํ•œ๊ตญ์–ด Context ์ฒ˜๋ฆฌ๊ฐ€๋Šฅ
23
+ - ํ•œ๊ตญ์–ด-์˜์–ด Pararell Corpus๋ฅผ ํ™œ์šฉํ•œ ํ•œ๊ตญ์–ด-์˜์–ด ์ง€์‹์—ฐ๊ฒฐ (์‚ฌ์ „ํ•™์Šต)
24
+ - ํ•œ๊ตญ์–ด ๋ฌธํ™”, ์–ธ์–ด๋ฅผ ๊ณ ๋ คํ•ด ์–ธ์–ดํ•™์ž๊ฐ€ ์ œ์ž‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋ฏธ์„ธ์กฐ์ •
25
+ - ๊ฐ•ํ™”ํ•™์Šต
26
+ ์ด ๋ชจ๋“ ๊ฒŒ ํ•œ๊บผ๋ฒˆ์— ์ ์šฉ๋˜๊ณ  ์ƒ์—…์  ์ด์šฉ์ด ๊ฐ€๋Šฅํ•œ Bllossom์„ ์ด์šฉํ•ด ์—ฌ๋Ÿฌ๋ถ„ ๋งŒ์˜ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋ณด์„ธ์šฅ! ๋ฌด๋ ค Colab ๋ฌด๋ฃŒ GPU๋กœ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
27
+
28
+ 1. Bllossom-8B๋Š” ์„œ์šธ๊ณผ๊ธฐ๋Œ€, ํ…Œ๋””์ธ, ์—ฐ์„ธ๋Œ€ ์–ธ์–ด์ž์› ์—ฐ๊ตฌ์‹ค์˜ ์–ธ์–ดํ•™์ž์™€ ํ˜‘์—…ํ•ด ๋งŒ๋“  ์‹ค์šฉ์ฃผ์˜๊ธฐ๋ฐ˜ ์–ธ์–ด๋ชจ๋ธ์ž…๋‹ˆ๋‹ค! ์•ž์œผ๋กœ ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•ด ๊ด€๋ฆฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค ๋งŽ์ด ํ™œ์šฉํ•ด์ฃผ์„ธ์š” ๐Ÿ™‚
29
+ 2. ์ดˆ ๊ฐ•๋ ฅํ•œ Advanced-Bllossom 8B, 70B๋ชจ๋ธ, ์‹œ๊ฐ-์–ธ์–ด๋ชจ๋ธ์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค! (๊ถ๊ธˆํ•˜์‹ ๋ถ„์€ ๊ฐœ๋ณ„ ์—ฐ๋ฝ์ฃผ์„ธ์š”!!)
30
+ 3. Bllossom์€ NAACL2024, LREC-COLING2024 (๊ตฌ๋‘) ๋ฐœํ‘œ๋กœ ์ฑ„ํƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
31
+ 4. ์ข‹์€ ์–ธ์–ด๋ชจ๋ธ ๊ณ„์† ์—…๋ฐ์ดํŠธ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!! ํ•œ๊ตญ์–ด ๊ฐ•ํ™”๋ฅผ์œ„ํ•ด ๊ณต๋™ ์—ฐ๊ตฌํ•˜์‹ค๋ถ„(ํŠนํžˆ๋…ผ๋ฌธ) ์–ธ์ œ๋“  ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!! ํŠนํžˆ ์†Œ๋Ÿ‰์˜ GPU๋ผ๋„ ๋Œ€์—ฌ ๊ฐ€๋Šฅํ•œํŒ€์€ ์–ธ์ œ๋“  ์—ฐ๋ฝ์ฃผ์„ธ์š”! ๋งŒ๋“ค๊ณ  ์‹ถ์€๊ฑฐ ๋„์™€๋“œ๋ ค์š”.
32
+ ```
33
 
34
  The Bllossom language model is a Korean-English bilingual language model based on the open-source LLama3. It enhances the connection of knowledge between Korean and English. It has the following features:
35
 
 
64
 
65
 
66
  ## NEWS
67
+ * [2024.05.08] Vocab Expansion Model Update
68
+ * [2024.04.25] We released Bllossom v2.0, based on llama-3
69
  * [2023/12] We released Bllossom-Vision v1.0, based on Bllossom
70
  * [2023/08] We released Bllossom v1.0, based on llama-2.
71
  * [2023/07] We released Bllossom v0.7, based on polyglot-ko.
72
 
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Example code
75
 
76
  ### Colab Tutorial
 
86
  import transformers
87
  import torch
88
 
89
+ model_id = "MLP-KTLim/llama-3-Korean-Bllossom-8B"
90
 
91
  pipeline = transformers.pipeline(
92
  "text-generation",
 
139
  import torch
140
  from transformers import AutoTokenizer, AutoModelForCausalLM
141
 
142
+ model_id = 'MLP-KTLim/llama-3-Korean-Bllossom-8B'
143
 
144
  tokenizer = AutoTokenizer.from_pretrained(model_id)
145
  model = AutoModelForCausalLM.from_pretrained(