Update README.md
Browse files
README.md
CHANGED
@@ -9,22 +9,23 @@ Zamba-7B-v1-phase1 is a hybrid model between Mamba, a state-space model, and tra
|
|
9 |
|
10 |
### Presequities
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
|
|
16 |
|
17 |
-
In order to run optimized Mamba implementations on a CUDA device, you
|
18 |
```bash
|
19 |
pip install mamba-ssm causal-conv1d>=1.2.0
|
20 |
```
|
21 |
|
22 |
-
You can run the model
|
23 |
|
24 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
25 |
|
26 |
|
27 |
-
|
28 |
|
29 |
```python
|
30 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
9 |
|
10 |
### Presequities
|
11 |
|
12 |
+
To download Zamba, clone Zyphra's fork of transformers:
|
13 |
+
1. `git clone https://github.com/Zyphra/transformers_zamba`
|
14 |
+
2. `cd transformers_zamba`
|
15 |
+
3. Install the repository: `pip install -e .`.
|
16 |
+
|
17 |
|
18 |
+
In order to run optimized Mamba implementations on a CUDA device, you need to install `mamba-ssm` and `causal-conv1d`:
|
19 |
```bash
|
20 |
pip install mamba-ssm causal-conv1d>=1.2.0
|
21 |
```
|
22 |
|
23 |
+
You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
|
24 |
|
25 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
26 |
|
27 |
|
28 |
+
### Inference
|
29 |
|
30 |
```python
|
31 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|