Update README.md
Browse files
README.md
CHANGED
@@ -25,9 +25,9 @@ pipeline_tag: text-generation
|
|
25 |
μ΄ μν€ν
μ²λ ν
μ€νΈ μμ±, μ§μμλ΅, λ¬Έμ μμ½, κ°μ λΆμκ³Ό κ°μ λ€μν μμ
μμ νμν μ±λ₯μ 보μ¬μ€λλ€.
|
26 |
|
27 |
# β· νμ΅ λ°μ΄ν°
|
28 |
-
- ktdsbaseLM v0.11μ μ΄ 3.6GB ν¬κΈ°μ λ°μ΄ν°λ₯Ό λ°νμΌλ‘ νμ΅λμμ΅λλ€.
|
29 |
-
κ·Έ μ€ 133λ§ κ±΄μ
|
30 |
-
Chain of Thought λ°©μμΌλ‘ νμ΅λμμ΅λλ€. λν 130λ§ κ±΄μ μ£Όκ΄μ λ¬Έμ λ νκ΅μ¬, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν λ±
|
31 |
νμ΅ λ°μ΄ν° μ€ νκ΅μ μ¬ν κ°μΉμ μΈκ°μ κ°μ μ μ΄ν΄νκ³ μ§μν μ¬νμ λ°λΌ μΆλ ₯ν μ μλ λ°μ΄ν°λ₯Ό νμ΅νμμ΅λλ€.
|
32 |
- νμ΅ Instruction Datasets Format:
|
33 |
<pre><code>{"prompt": "prompt text", "completion": "ideal generated text"}</code></pre>
|
@@ -83,9 +83,9 @@ optimized for various NLP tasks like text generation, question answering, docume
|
|
83 |
# β· Training Data
|
84 |
|
85 |
KTDSbaseLM v0.11 was trained on 3.6GB of data, comprising 2.33 million Q&A instances.
|
86 |
-
This includes 1.33 million multiple-choice questions across
|
87 |
finance, law, tax, and science, trained with the Chain of Thought method. Additionally,
|
88 |
-
1.3 million short-answer questions cover
|
89 |
|
90 |
**Training Instruction Dataset Format**:
|
91 |
`{"prompt": "prompt text", "completion": "ideal generated text"}`
|
|
|
25 |
μ΄ μν€ν
μ²λ ν
μ€νΈ μμ±, μ§μμλ΅, λ¬Έμ μμ½, κ°μ λΆμκ³Ό κ°μ λ€μν μμ
μμ νμν μ±λ₯μ 보μ¬μ€λλ€.
|
26 |
|
27 |
# β· νμ΅ λ°μ΄ν°
|
28 |
+
- ktdsbaseLM v0.11μ μ체 κ°λ°ν μ΄ 3.6GB ν¬κΈ°μ λ°μ΄ν°λ₯Ό λ°νμΌλ‘ νμ΅λμμ΅λλ€. λͺ¨λ 233λ§ κ±΄μ QnA, μμ½, λΆλ₯ λ± λ°μ΄ν°λ₯Ό ν¬ν¨νλ©°,
|
29 |
+
κ·Έ μ€ 133λ§ κ±΄μ 53κ° μμμ κ°κ΄μ λ¬Έμ λ‘ κ΅¬μ±λμμ΅λλ€. μ΄ μμμλ νκ΅μ¬, μ¬ν, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν, μλ¬Ό, 물리, νν λ±μ΄ ν¬ν¨λλ©°,
|
30 |
+
Chain of Thought λ°©μμΌλ‘ νμ΅λμμ΅λλ€. λν 130λ§ κ±΄μ μ£Όκ΄μ λ¬Έμ λ νκ΅μ¬, μ¬λ¬΄, λ²λ₯ , μΈλ¬΄, μν λ± 38κ° μμμ κ±Έμ³ νμ΅λμμ΅λλ€.
|
31 |
νμ΅ λ°μ΄ν° μ€ νκ΅μ μ¬ν κ°μΉμ μΈκ°μ κ°μ μ μ΄ν΄νκ³ μ§μν μ¬νμ λ°λΌ μΆλ ₯ν μ μλ λ°μ΄ν°λ₯Ό νμ΅νμμ΅λλ€.
|
32 |
- νμ΅ Instruction Datasets Format:
|
33 |
<pre><code>{"prompt": "prompt text", "completion": "ideal generated text"}</code></pre>
|
|
|
83 |
# β· Training Data
|
84 |
|
85 |
KTDSbaseLM v0.11 was trained on 3.6GB of data, comprising 2.33 million Q&A instances.
|
86 |
+
This includes 1.33 million multiple-choice questions across 53 domains such as history,
|
87 |
finance, law, tax, and science, trained with the Chain of Thought method. Additionally,
|
88 |
+
1.3 million short-answer questions cover 38 domains including history, finance, and law.
|
89 |
|
90 |
**Training Instruction Dataset Format**:
|
91 |
`{"prompt": "prompt text", "completion": "ideal generated text"}`
|