|
--- |
|
tags: |
|
- safe |
|
- mamba |
|
- attention |
|
- hybrid |
|
- molecular-generation |
|
- smiles |
|
- generated_from_trainer |
|
datasets: |
|
- katielink/moses |
|
model-index: |
|
- name: HYBRID_20M |
|
results: [] |
|
--- |
|
|
|
# HYBRID_20M |
|
|
|
HYBRID_20M is a model developed for molecular generation tasks, incorporating both **Mamba** and **Attention** layers to utilize the advantages of each architecture. **The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE).** The model was trained from scratch on the [MOSES](https://huggingface.co/datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co/anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co/anrilombard/ssm-20m). |
|
|
|
## Evaluation Results |
|
|
|
HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms. |
|
|
|
## Model Description |
|
|
|
HYBRID_20M employs a hybrid architecture that integrates the **Mamba** framework with **Attention** layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms. |
|
|
|
### Mamba Framework |
|
|
|
The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication: |
|
|
|
```bibtex |
|
@article{gu2023mamba, |
|
title={Mamba: Linear-time sequence modeling with selective state spaces}, |
|
author={Gu, Albert and Dao, Tri}, |
|
journal={arXiv preprint arXiv:2312.00752}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
We acknowledge the authors for their contributions to sequence modeling. |
|
|
|
### Attention Mechanisms |
|
|
|
Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures. |
|
|
|
### SAFE Framework |
|
|
|
The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication: |
|
|
|
```bibtex |
|
@article{noutahi2024gotta, |
|
title={Gotta be SAFE: a new framework for molecular design}, |
|
author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio}, |
|
journal={Digital Discovery}, |
|
volume={3}, |
|
number={4}, |
|
pages={796--804}, |
|
year={2024}, |
|
publisher={Royal Society of Chemistry} |
|
} |
|
``` |
|
|
|
We acknowledge the authors for their contributions to molecular design. |
|
|
|
## Intended Uses & Limitations |
|
|
|
### Intended Uses |
|
|
|
HYBRID_20M is intended for: |
|
|
|
- **Generating Molecular Structures:** Creating novel molecules with desired properties. |
|
- **Exploring Chemical Space:** Investigating the vast array of possible chemical compounds for research and development. |
|
- **Assisting in Material Design:** Facilitating the creation of new materials with specific functionalities. |
|
|
|
### Limitations |
|
|
|
- **Validation Required:** Outputs should be validated by domain experts before practical application. |
|
- **Synthetic Feasibility:** Generated molecules may not always be synthetically feasible. |
|
- **Dataset Scope:** The model's knowledge is limited to the chemical space represented in the MOSES dataset. |
|
|
|
## Training and Evaluation Data |
|
|
|
The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co/datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks. |
|
|
|
## Training Procedure |
|
|
|
### Training Hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
- **Learning Rate:** 0.0005 |
|
- **Training Batch Size:** 32 |
|
- **Evaluation Batch Size:** 32 |
|
- **Seed:** 42 |
|
- **Gradient Accumulation Steps:** 2 |
|
- **Total Training Batch Size:** 64 |
|
- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08) |
|
- **Learning Rate Scheduler:** Linear with 20,000 warmup steps |
|
- **Number of Epochs:** 10 |
|
|
|
### Framework Versions |
|
|
|
- **Mamba:** [Specify version] |
|
- **PyTorch:** [Specify version] |
|
- **Datasets:** 2.20.0 |
|
- **Tokenizers:** 0.19.1 |
|
|
|
## Acknowledgements |
|
|
|
We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design. |
|
|
|
## References |
|
|
|
```bibtex |
|
@article{gu2023mamba, |
|
title={Mamba: Linear-time sequence modeling with selective state spaces}, |
|
author={Gu, Albert and Dao, Tri}, |
|
journal={arXiv preprint arXiv:2312.00752}, |
|
year={2023} |
|
} |
|
|
|
@article{noutahi2024gotta, |
|
title={Gotta be SAFE: a new framework for molecular design}, |
|
author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio}, |
|
journal={Digital Discovery}, |
|
volume={3}, |
|
number={4}, |
|
pages={796--804}, |
|
year={2024}, |
|
publisher={Royal Society of Chemistry} |
|
} |
|
``` |
|
|