Update README.md
Browse files
README.md
CHANGED
@@ -3,197 +3,47 @@ library_name: transformers
|
|
3 |
license: apache-2.0
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
-
|
52 |
-
### Out-of-Scope Use
|
53 |
-
|
54 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
-
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
-
## How to Get Started with the Model
|
71 |
-
|
72 |
-
Use the code below to get started with the model.
|
73 |
-
|
74 |
-
[More Information Needed]
|
75 |
-
|
76 |
-
## Training Details
|
77 |
-
|
78 |
-
### Training Data
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
-
|
82 |
-
[More Information Needed]
|
83 |
-
|
84 |
-
### Training Procedure
|
85 |
-
|
86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
-
|
97 |
-
#### Speeds, Sizes, Times [optional]
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
-
|
101 |
-
[More Information Needed]
|
102 |
-
|
103 |
-
## Evaluation
|
104 |
-
|
105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
|
|
3 |
license: apache-2.0
|
4 |
---
|
5 |
|
6 |
+
# Tora-12B
|
7 |
+
|
8 |
+
- Model was implemented via [ChatVector](https://arxiv.org/abs/2310.04799)
|
9 |
+
|
10 |
+
```
|
11 |
+
Tora-12B = cyberagent/Mistral-Nemo-Japanese-Instruct-2408 + (cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b - mistralai/Mistral-Nemo-Base-2407)
|
12 |
+
```
|
13 |
+
|
14 |
+
## Model license
|
15 |
+
|
16 |
+
|model|license|
|
17 |
+
|:---|:---|
|
18 |
+
|cyberagent/Mistral-Nemo-Japanese-Instruct-2408|apache-2.0|
|
19 |
+
|cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b|apache-2.0|
|
20 |
+
|mistralai/Mistral-Nemo-Base-2407|apache-2.0|
|
21 |
+
|
22 |
+
# Benchmark
|
23 |
+
|
24 |
+
- Benchmark was conducted via [Open Japanese LLM Leaderboard](https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard)
|
25 |
+
|
26 |
+
|AVG|AVG (NLI)|AVG (QA)|AVG (RC)|AVG (MC)|AVG (EL)|AVG (FA)|AVG (MR)|AVG (MT)|AVG (HE)|AVG (CG)|AVG (SUM)|
|
27 |
+
|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|
|
28 |
+
|0.5036|0.6189|0.4522|0.8868|0.7748|0.5251|0.1908|0.684|0.8132|0.4971|0.0|0.0972|
|
29 |
+
|
30 |
+
# Reference
|
31 |
+
|
32 |
+
```
|
33 |
+
@misc{OJLL,
|
34 |
+
author = {Miyao, Yusuke and Ishida, Shigeki and Okamoto, Takumi and Han, Namgi and Mousterou, Akim and Fourrier, Clémentine and Hayashi, Toshihiro and Tachibana, Yuichiro},
|
35 |
+
title = {Open Japanese LLM Leaderboard},
|
36 |
+
year = {2024},
|
37 |
+
publisher = {OJLL},
|
38 |
+
howpublished = "\url{https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard}"
|
39 |
+
}
|
40 |
+
@misc{llmjp2024llmjpcrossorganizationalprojectresearch,
|
41 |
+
title={LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs},
|
42 |
+
author={LLM-jp and : and Akiko Aizawa and Eiji Aramaki and Bowen Chen and Fei Cheng and Hiroyuki Deguchi and Rintaro Enomoto and Kazuki Fujii and Kensuke Fukumoto and Takuya Fukushima and Namgi Han and Yuto Harada and Chikara Hashimoto and Tatsuya Hiraoka and Shohei Hisada and Sosuke Hosokawa and Lu Jie and Keisuke Kamata and Teruhito Kanazawa and Hiroki Kanezashi and Hiroshi Kataoka and Satoru Katsumata and Daisuke Kawahara and Seiya Kawano and Atsushi Keyaki and Keisuke Kiryu and Hirokazu Kiyomaru and Takashi Kodama and Takahiro Kubo and Yohei Kuga and Ryoma Kumon and Shuhei Kurita and Sadao Kurohashi and Conglong Li and Taiki Maekawa and Hiroshi Matsuda and Yusuke Miyao and Kentaro Mizuki and Sakae Mizuki and Yugo Murawaki and Ryo Nakamura and Taishi Nakamura and Kouta Nakayama and Tomoka Nakazato and Takuro Niitsuma and Jiro Nishitoba and Yusuke Oda and Hayato Ogawa and Takumi Okamoto and Naoaki Okazaki and Yohei Oseki and Shintaro Ozaki and Koki Ryu and Rafal Rzepka and Keisuke Sakaguchi and Shota Sasaki and Satoshi Sekine and Kohei Suda and Saku Sugawara and Issa Sugiura and Hiroaki Sugiyama and Hisami Suzuki and Jun Suzuki and Toyotaro Suzumura and Kensuke Tachibana and Yu Takagi and Kyosuke Takami and Koichi Takeda and Masashi Takeshita and Masahiro Tanaka and Kenjiro Taura and Arseny Tolmachev and Nobuhiro Ueda and Zhen Wan and Shuntaro Yada and Sakiko Yahata and Yuya Yamamoto and Yusuke Yamauchi and Hitomi Yanaka and Rio Yokota and Koichiro Yoshino},
|
43 |
+
year={2024},
|
44 |
+
eprint={2407.03963},
|
45 |
+
archivePrefix={arXiv},
|
46 |
+
primaryClass={cs.CL},
|
47 |
+
url={https://arxiv.org/abs/2407.03963},
|
48 |
+
}
|
49 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|