hybrid-20m / README.md
anrilombard's picture
Upload README.md with huggingface_hub
af48d34 verified
---
tags:
- safe
- mamba
- attention
- hybrid
- molecular-generation
- smiles
- generated_from_trainer
datasets:
- katielink/moses
model-index:
- name: HYBRID_20M
results: []
---
# HYBRID_20M
HYBRID_20M is a model developed for molecular generation tasks, incorporating both **Mamba** and **Attention** layers to utilize the advantages of each architecture. **The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE).** The model was trained from scratch on the [MOSES](https://huggingface.co/datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co/anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co/anrilombard/ssm-20m).
## Evaluation Results
HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms.
## Model Description
HYBRID_20M employs a hybrid architecture that integrates the **Mamba** framework with **Attention** layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms.
### Mamba Framework
The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication:
```bibtex
@article{gu2023mamba,
title={Mamba: Linear-time sequence modeling with selective state spaces},
author={Gu, Albert and Dao, Tri},
journal={arXiv preprint arXiv:2312.00752},
year={2023}
}
```
We acknowledge the authors for their contributions to sequence modeling.
### Attention Mechanisms
Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures.
### SAFE Framework
The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication:
```bibtex
@article{noutahi2024gotta,
title={Gotta be SAFE: a new framework for molecular design},
author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
journal={Digital Discovery},
volume={3},
number={4},
pages={796--804},
year={2024},
publisher={Royal Society of Chemistry}
}
```
We acknowledge the authors for their contributions to molecular design.
## Intended Uses & Limitations
### Intended Uses
HYBRID_20M is intended for:
- **Generating Molecular Structures:** Creating novel molecules with desired properties.
- **Exploring Chemical Space:** Investigating the vast array of possible chemical compounds for research and development.
- **Assisting in Material Design:** Facilitating the creation of new materials with specific functionalities.
### Limitations
- **Validation Required:** Outputs should be validated by domain experts before practical application.
- **Synthetic Feasibility:** Generated molecules may not always be synthetically feasible.
- **Dataset Scope:** The model's knowledge is limited to the chemical space represented in the MOSES dataset.
## Training and Evaluation Data
The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co/datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks.
## Training Procedure
### Training Hyperparameters
The following hyperparameters were used during training:
- **Learning Rate:** 0.0005
- **Training Batch Size:** 32
- **Evaluation Batch Size:** 32
- **Seed:** 42
- **Gradient Accumulation Steps:** 2
- **Total Training Batch Size:** 64
- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
- **Learning Rate Scheduler:** Linear with 20,000 warmup steps
- **Number of Epochs:** 10
### Framework Versions
- **Mamba:** [Specify version]
- **PyTorch:** [Specify version]
- **Datasets:** 2.20.0
- **Tokenizers:** 0.19.1
## Acknowledgements
We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design.
## References
```bibtex
@article{gu2023mamba,
title={Mamba: Linear-time sequence modeling with selective state spaces},
author={Gu, Albert and Dao, Tri},
journal={arXiv preprint arXiv:2312.00752},
year={2023}
}
@article{noutahi2024gotta,
title={Gotta be SAFE: a new framework for molecular design},
author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
journal={Digital Discovery},
volume={3},
number={4},
pages={796--804},
year={2024},
publisher={Royal Society of Chemistry}
}
```