danielpark commited on
Commit
8ff94ab
1 Parent(s): 5610aae

doc: update model cards

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -7,11 +7,14 @@ tags:
7
  - moe
8
  ---
9
 
10
- # Jamba-v0.1-9B
11
 
12
- A dense version of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1), which extracts the weights of the first expert.
13
- It no longer uses MoE. Please refer to [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) for details.
14
- It can use single 3090/4090 for inference, and the usage method is exactly the same as Jamba-v0.1.
 
 
 
 
15
 
16
  ---
17
 
 
7
  - moe
8
  ---
9
 
 
10
 
11
+
12
+ ### Required Weights for Follow-up Research
13
+
14
+ The original model is **AI21lab's Jamba-v0.1**, which requires an **A100 80GB GPU**. Unfortunately, this was not available via Google Colab or cloud computing services. Attempts were made to perform **MoE (Mixture of Experts) splitting**, using the following resources as a basis:
15
+
16
+ - **Base creation**: Referenced for subsequent tasks.
17
+ - **MoE Layer Separation**: Consult [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) from [TechxGenus/Jamba-v0.1-9B](https://huggingface.co/TechxGenus/Jamba-v0.1-9B).
18
 
19
  ---
20