BerenMillidge commited on
Commit
19d07cb
1 Parent(s): 5095564

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
3
  ---
4
  # Model Card for Zamba
5
 
6
- Zamba-7B-v1 is a hybrid between state-space models (Specifically Mamba) and transformer, and was trained using next-token prediction. Zamba uses a shared transformer layer after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data.
7
 
8
  ## Quick start
9
 
@@ -28,8 +28,8 @@ You can run the model not using the optimized Mamba kernels, but it is **not** r
28
  from transformers import AutoTokenizer, AutoModelForCausalLM
29
  import torch
30
 
31
- tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1")
32
- model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16)
33
 
34
  input_text = "A funny prompt would be "
35
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
3
  ---
4
  # Model Card for Zamba
5
 
6
+ Zamba-7B-v1-phase1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1-phase-1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Unlike Zamba-v1, this model represents the checkpoint after pure prertaining only on web-datasets. We envision its use primarily as a comparison tool to explore the effects of our annealing process.
7
 
8
  ## Quick start
9
 
 
28
  from transformers import AutoTokenizer, AutoModelForCausalLM
29
  import torch
30
 
31
+ tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1-phase1")
32
+ model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1-phase1", device_map="auto", torch_dtype=torch.bfloat16)
33
 
34
  input_text = "A funny prompt would be "
35
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")