Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
12 |
|
13 |
|
14 |
|
15 |
-
βΆ λͺ¨λΈ μ€λͺ
|
16 |
- λͺ¨λΈλͺ
λ° μ£ΌμκΈ°λ₯:
|
17 |
KTDSbaseLM v0.11μ OpenChat 3.5 λͺ¨λΈμ κΈ°λ°μΌλ‘ SFT λ°©μμΌλ‘ νμΈνλλ Mistral 7B / openchat3.5 κΈ°λ° λͺ¨λΈμ
λλ€.
|
18 |
νκ΅μ΄μ νκ΅μ λ€μν λ¬Ένμ λ§₯λ½μ μ΄ν΄νλλ‘ μ€κ³λμμΌλ©° β¨β¨, μ체 μ μν 135κ° μμμ νκ΅μ΄
|
@@ -24,14 +24,14 @@ pipeline_tag: text-generation
|
|
24 |
Mistral 7Bμ κ²½λνλ ꡬ쑰λ λΉ λ₯Έ μΆλ‘ μλμ λ©λͺ¨λ¦¬ ν¨μ¨μ±μ 보μ₯νλ©°, λ€μν μμ°μ΄ μ²λ¦¬ μμ
μ μ ν©νκ² μ΅μ νλμ΄ μμ΅λλ€.
|
25 |
μ΄ μν€ν
μ²λ ν
μ€νΈ μμ±, μ§μμλ΅, λ¬Έμ μμ½, κ°μ λΆμκ³Ό κ°μ λ€μν μμ
μμ νμν μ±λ₯μ 보μ¬μ€λλ€.
|
26 |
|
27 |
-
β· νμ΅ λ°μ΄ν°
|
28 |
- ktdsbaseLM v0.11μ μ΄ 3.6GB ν¬κΈ°μ λ°μ΄ν°λ₯Ό λ°νμΌλ‘ νμ΅λμμ΅λλ€. μ΄ 233λ§ κ±΄μ QnA λ°μ΄ν°λ₯Ό ν¬ν¨νλ©°,
|
29 |
κ·Έ μ€ 133λ§ κ±΄μ 135κ° μμμ κ°κ΄μ λ¬Έμ λ‘ κ΅¬μ±λμμ΅λλ€. μ΄ μμμλ νκ΅μ¬, μ¬ν, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν, μλ¬Ό, 물리, νν λ±μ΄ ν¬ν¨λλ©°,
|
30 |
Chain of Thought λ°©μμΌλ‘ νμ΅λμμ΅λλ€. λν 130λ§ κ±΄μ μ£Όκ΄μ λ¬Έμ λ νκ΅μ¬, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν λ± 100κ° μμμ κ±Έμ³ νμ΅λμμ΅λλ€.
|
31 |
- νμ΅ Instruction Datasets Format:
|
32 |
<pre><code>{"prompt": "prompt text", "completion": "ideal generated text"}</code></pre>
|
33 |
|
34 |
-
βΈ μ¬μ© μ¬λ‘
|
35 |
ktdsbaseLM v0.11μ λ€μν μμ© λΆμΌμμ μ¬μ©λ μ μμ΅λλ€. μλ₯Ό λ€μ΄:
|
36 |
- κ΅μ‘ λΆμΌ: μμ¬, μν, κ³Όν λ± λ€μν νμ΅ μλ£μ λν μ§μμλ΅ λ° μ€λͺ
μμ±.
|
37 |
- λΉμ¦λμ€: λ²λ₯ , μ¬λ¬΄, μΈλ¬΄ κ΄λ ¨ μ§μμ λν λ΅λ³ μ 곡 λ° λ¬Έμ μμ½.
|
@@ -39,14 +39,14 @@ pipeline_tag: text-generation
|
|
39 |
- κ³ κ° μλΉμ€: μ¬μ©μμμ λν μμ± λ° λ§μΆ€ν μλ΅ μ 곡.
|
40 |
- μ΄ λͺ¨λΈμ λ€μν μμ°μ΄ μ²λ¦¬ μμ
μμ λμ νμ©λλ₯Ό κ°μ§λλ€.
|
41 |
|
42 |
-
βΉ νκ³ ββ
|
43 |
- ktdsBaseLM v0.11μ νκ΅μ΄μ νκ΅ λ¬Ένμ νΉνλμ΄ μμΌλ,
|
44 |
νΉμ μμ(μ: μ΅μ κ΅μ μλ£, μ λ¬Έ λΆμΌ)μ λ°μ΄ν° λΆμ‘±μΌλ‘ μΈν΄ λ€λ₯Έ μΈμ΄ λλ
|
45 |
λ¬Ένμ λν μλ΅μ μ νμ±μ΄ λ¨μ΄μ§ μ μμ΅λλ€.
|
46 |
λν, 볡μ‘ν λ
Όλ¦¬μ μ¬κ³ λ₯Ό μꡬνλ λ¬Έμ μ λν΄ μ νλ μΆλ‘ λ₯λ ₯μ λ³΄μΌ μ μμΌλ©°,
|
47 |
νΈν₯λ λ°μ΄ν°κ° ν¬ν¨λ κ²½μ° νΈν₯λ μλ΅μ΄ μμ±λ κ°λ₯μ±λ μ‘΄μ¬ν©λλ€.
|
48 |
|
49 |
-
βΊ μ¬μ© λ°©λ²
|
50 |
<pre><code>
|
51 |
from transformers import AutoModel, AutoTokenizer
|
52 |
|
@@ -62,7 +62,7 @@ Hereβs the English version of the provided text:
|
|
62 |
|
63 |
---
|
64 |
|
65 |
-
|
66 |
|
67 |
**Model Name and Key Features**:
|
68 |
KTDSbaseLM v0.11 is based on the OpenChat 3.5 model, fine-tuned using the SFT method on the Mistral 7B model.
|
@@ -79,7 +79,7 @@ optimized for various NLP tasks like text generation, question answering, docume
|
|
79 |
|
80 |
---
|
81 |
|
82 |
-
|
83 |
|
84 |
KTDSbaseLM v0.11 was trained on 3.6GB of data, comprising 2.33 million Q&A instances.
|
85 |
This includes 1.33 million multiple-choice questions across 135 domains such as history,
|
@@ -91,7 +91,7 @@ finance, law, tax, and science, trained with the Chain of Thought method. Additi
|
|
91 |
|
92 |
---
|
93 |
|
94 |
-
|
95 |
|
96 |
KTDSbaseLM v0.11 can be used across multiple fields, such as:
|
97 |
|
@@ -104,7 +104,7 @@ This model is highly versatile in various NLP tasks.
|
|
104 |
|
105 |
---
|
106 |
|
107 |
-
|
108 |
|
109 |
KTDSBaseLM v0.11 is specialized in Korean language and culture.
|
110 |
However, it may lack accuracy in responding to topics outside its scope,
|
@@ -114,7 +114,7 @@ may produce biased responses if trained on biased data.
|
|
114 |
|
115 |
---
|
116 |
|
117 |
-
|
118 |
<pre><code>
|
119 |
from transformers import AutoModel, AutoTokenizer
|
120 |
|
|
|
12 |
|
13 |
|
14 |
|
15 |
+
# βΆ λͺ¨λΈ μ€λͺ
|
16 |
- λͺ¨λΈλͺ
λ° μ£ΌμκΈ°λ₯:
|
17 |
KTDSbaseLM v0.11μ OpenChat 3.5 λͺ¨λΈμ κΈ°λ°μΌλ‘ SFT λ°©μμΌλ‘ νμΈνλλ Mistral 7B / openchat3.5 κΈ°λ° λͺ¨λΈμ
λλ€.
|
18 |
νκ΅μ΄μ νκ΅μ λ€μν λ¬Ένμ λ§₯λ½μ μ΄ν΄νλλ‘ μ€κ³λμμΌλ©° β¨β¨, μ체 μ μν 135κ° μμμ νκ΅μ΄
|
|
|
24 |
Mistral 7Bμ κ²½λνλ ꡬ쑰λ λΉ λ₯Έ μΆλ‘ μλμ λ©λͺ¨λ¦¬ ν¨μ¨μ±μ 보μ₯νλ©°, λ€μν μμ°μ΄ μ²λ¦¬ μμ
μ μ ν©νκ² μ΅μ νλμ΄ μμ΅λλ€.
|
25 |
μ΄ μν€ν
μ²λ ν
μ€νΈ μμ±, μ§μμλ΅, λ¬Έμ μμ½, κ°μ λΆμκ³Ό κ°μ λ€μν μμ
μμ νμν μ±λ₯μ 보μ¬μ€λλ€.
|
26 |
|
27 |
+
# β· νμ΅ λ°μ΄ν°
|
28 |
- ktdsbaseLM v0.11μ μ΄ 3.6GB ν¬κΈ°μ λ°μ΄ν°λ₯Ό λ°νμΌλ‘ νμ΅λμμ΅λλ€. μ΄ 233λ§ κ±΄μ QnA λ°μ΄ν°λ₯Ό ν¬ν¨νλ©°,
|
29 |
κ·Έ μ€ 133λ§ κ±΄μ 135κ° μμμ κ°κ΄μ λ¬Έμ λ‘ κ΅¬μ±λμμ΅λλ€. μ΄ μμμλ νκ΅μ¬, μ¬ν, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν, μλ¬Ό, 물리, νν λ±μ΄ ν¬ν¨λλ©°,
|
30 |
Chain of Thought λ°©μμΌλ‘ νμ΅λμμ΅λλ€. λν 130λ§ κ±΄μ μ£Όκ΄μ λ¬Έμ λ νκ΅μ¬, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν λ± 100κ° μμμ κ±Έμ³ νμ΅λμμ΅λλ€.
|
31 |
- νμ΅ Instruction Datasets Format:
|
32 |
<pre><code>{"prompt": "prompt text", "completion": "ideal generated text"}</code></pre>
|
33 |
|
34 |
+
# βΈ μ¬μ© μ¬λ‘
|
35 |
ktdsbaseLM v0.11μ λ€μν μμ© λΆμΌμμ μ¬μ©λ μ μμ΅λλ€. μλ₯Ό λ€μ΄:
|
36 |
- κ΅μ‘ λΆμΌ: μμ¬, μν, κ³Όν λ± λ€μν νμ΅ μλ£μ λν μ§μμλ΅ λ° μ€λͺ
μμ±.
|
37 |
- λΉμ¦λμ€: λ²λ₯ , μ¬λ¬΄, μΈλ¬΄ κ΄λ ¨ μ§μμ λν λ΅λ³ μ 곡 λ° λ¬Έμ μμ½.
|
|
|
39 |
- κ³ κ° μλΉμ€: μ¬μ©μμμ λν μμ± λ° λ§μΆ€ν μλ΅ μ 곡.
|
40 |
- μ΄ λͺ¨λΈμ λ€μν μμ°μ΄ μ²λ¦¬ μμ
μμ λμ νμ©λλ₯Ό κ°μ§λλ€.
|
41 |
|
42 |
+
# βΉ νκ³ ββ
|
43 |
- ktdsBaseLM v0.11μ νκ΅μ΄μ νκ΅ λ¬Ένμ νΉνλμ΄ μμΌλ,
|
44 |
νΉμ μμ(μ: μ΅μ κ΅μ μλ£, μ λ¬Έ λΆμΌ)μ λ°μ΄ν° λΆμ‘±μΌλ‘ μΈν΄ λ€λ₯Έ μΈμ΄ λλ
|
45 |
λ¬Ένμ λν μλ΅μ μ νμ±μ΄ λ¨μ΄μ§ μ μμ΅λλ€.
|
46 |
λν, 볡μ‘ν λ
Όλ¦¬μ μ¬κ³ λ₯Ό μꡬνλ λ¬Έμ μ λν΄ μ νλ μΆλ‘ λ₯λ ₯μ λ³΄μΌ μ μμΌλ©°,
|
47 |
νΈν₯λ λ°μ΄ν°κ° ν¬ν¨λ κ²½μ° νΈν₯λ μλ΅μ΄ μμ±λ κ°λ₯μ±λ μ‘΄μ¬ν©λλ€.
|
48 |
|
49 |
+
# βΊ μ¬μ© λ°©λ²
|
50 |
<pre><code>
|
51 |
from transformers import AutoModel, AutoTokenizer
|
52 |
|
|
|
62 |
|
63 |
---
|
64 |
|
65 |
+
# βΆ Model Description
|
66 |
|
67 |
**Model Name and Key Features**:
|
68 |
KTDSbaseLM v0.11 is based on the OpenChat 3.5 model, fine-tuned using the SFT method on the Mistral 7B model.
|
|
|
79 |
|
80 |
---
|
81 |
|
82 |
+
# β· Training Data
|
83 |
|
84 |
KTDSbaseLM v0.11 was trained on 3.6GB of data, comprising 2.33 million Q&A instances.
|
85 |
This includes 1.33 million multiple-choice questions across 135 domains such as history,
|
|
|
91 |
|
92 |
---
|
93 |
|
94 |
+
# βΈ Use Cases
|
95 |
|
96 |
KTDSbaseLM v0.11 can be used across multiple fields, such as:
|
97 |
|
|
|
104 |
|
105 |
---
|
106 |
|
107 |
+
# βΉ Limitations
|
108 |
|
109 |
KTDSBaseLM v0.11 is specialized in Korean language and culture.
|
110 |
However, it may lack accuracy in responding to topics outside its scope,
|
|
|
114 |
|
115 |
---
|
116 |
|
117 |
+
# βΊ Usage Instructions
|
118 |
<pre><code>
|
119 |
from transformers import AutoModel, AutoTokenizer
|
120 |
|