BerenMillidge
commited on
Commit
•
19d07cb
1
Parent(s):
5095564
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
|
|
3 |
---
|
4 |
# Model Card for Zamba
|
5 |
|
6 |
-
Zamba-7B-v1 is a hybrid between state-space
|
7 |
|
8 |
## Quick start
|
9 |
|
@@ -28,8 +28,8 @@ You can run the model not using the optimized Mamba kernels, but it is **not** r
|
|
28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
29 |
import torch
|
30 |
|
31 |
-
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1")
|
32 |
-
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16)
|
33 |
|
34 |
input_text = "A funny prompt would be "
|
35 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
3 |
---
|
4 |
# Model Card for Zamba
|
5 |
|
6 |
+
Zamba-7B-v1-phase1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1-phase-1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Unlike Zamba-v1, this model represents the checkpoint after pure prertaining only on web-datasets. We envision its use primarily as a comparison tool to explore the effects of our annealing process.
|
7 |
|
8 |
## Quick start
|
9 |
|
|
|
28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
29 |
import torch
|
30 |
|
31 |
+
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1-phase1")
|
32 |
+
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1-phase1", device_map="auto", torch_dtype=torch.bfloat16)
|
33 |
|
34 |
input_text = "A funny prompt would be "
|
35 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|