goombalab
/

hydra

Model card Files Files and versions Community

hydra / README.md

sukjunh

Update README.md

89204f4 verified 5 months ago

preview code

raw

history blame

3.83 kB

	---
	license: apache-2.0
	datasets:
	- allenai/c4
	---
	# Hydra

	![Hydra](./matrix_mixer.png "Matrix Mixer")
	> Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers\
	> Sukjun Hwang, Aakash Lahoti, Tri Dao, Albert Gu\
	> Paper: https://arxiv.org/abs/2407.09941 \
	> Blogpost: https://goombalab.github.io/blog/2024/hydra-part1-matrix-mixer/

	## About

	## Installation
	Follow the installation section of [Mamba](https://github.com/state-spaces/mamba); simply,
	```bash
	pip install mamba-ssm
	```

	[Option] For training BERT (`./hydra/bert`), install additional required packages via
	```bash
	pip install -r requirements.txt
	```

	## Usage

	### Hydra Block

	The quasiseparable matrix mixer, *Hydra*, is our best model for bidirectional sequence processing (details in Section 3).\
	The implementation is at [./hydra/modules/hydra.py](https://github.com/goombalab/hydra/blob/main/hydra/modules/hydra.py).

	```python
	import torch
	from .hydra import Hydra

	batch, length, dim = 2, 64, 16
	x = torch.randn(batch, length, dim).to("cuda")
	model = Hydra(
	d_model=dim, # Model dimension d_model
	d_state=64, # SSM state expansion factor
	d_conv=7, # Local non-causal convolution width
	expand=2, # Block expansion factor
	).to("cuda")
	y = model(x)
	assert y.shape == x.shape
	```

	### Matrix Mixer Block

	The matrix mixer framework is implemented at [./hydra/modules/matrix_mixer.py](https://github.com/goombalab/hydra/blob/main/hydra/modules/matrix_mixer.py).\
	You can easily integrate your own mixer matrix by following our implementations of various sequence mixers located at [./hydra/modules/matrix_mixers/](./hydra/modules/matrix_mixers/)!

	```python
	from .hydra import MatrixMixer

	model = MatrixMixer(
	"""
	matrix_mixer_type: options for matrix_mixer_type
	{'dense', 'toeplitz', 'vandermonde', 'cauchy', 'low_rank', 'attention', 'quasiseparable'}
	is_data_dependent: boolean flag to parameterize the mixer matrix to SAM
	"""
	matrix_mixer_type=matrix_mixer_type,
	is_data_dependent=is_data_dependent,
	d_model=dim, # Model dimension d_model
	qk_dim=qk_dim, # dimension for QK
	).to("cuda")
	y = model(x)
	assert y.shape == x.shape
	```

	### BERT

	Our code for training BERT ([./hydra/bert/](https://github.com/goombalab/hydra/blob/main/hydra/bert/)) is based on [MosaicBERT](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert) and [M2](https://github.com/HazyResearch/m2).

	Follow the instructions of MosaicBERT ([./hydra/bert/README.md](https://github.com/goombalab/hydra/blob/main/hydra/bert/README.md)) for details (e.g., setting up dataset and running code). \
	The default configurations for Hydra and MatrixMixer are located at:
	- Pretrain: [./hydra/bert/yamls/pretrain](https://github.com/goombalab/hydra/blob/main/hydra/bert/yamls/pretrain)
	- Finetune: [./hydra/bert/yamls/finetune](https://github.com/goombalab/hydra/blob/main/hydra/bert/yamls/finetune)

	#### Example commands:
	Pretrain Hydra on C4 using a single GPU:
	```bash
	python main.py yamls/pretrain/hydra.yaml
	```
	Pretrain Hydra on C4 using 8 GPUs:
	```bash
	composer -n 8 main.py yamls/pretrain/hydra.yaml
	```
	Finetune Hydra on GLUE:
	```bash
	python glue.py yamls/finetune/hydra.yaml
	```

	## Acknowledgement
	We thank the authors of [Mamba](https://github.com/state-spaces/mamba), [MosaicBERT](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert), and [M2](https://github.com/HazyResearch/m2) for their wonderful codebases.

	## Citation
	If you use this codebase, or otherwise find our work valuable, please cite Hydra:
	```
	@article{hydra,
	title={Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers},
	author={Hwang, Sukjun and Lahoti, Aakash and Dao, Tri and Gu, Albert},
	journal={arXiv preprint arXiv:2407.09941},
	year={2024}
	}
	```