File size: 5,239 Bytes

af48d34

---
tags:
  - safe
  - mamba
  - attention
  - hybrid
  - molecular-generation
  - smiles
  - generated_from_trainer
datasets:
  - katielink/moses
model-index:
  - name: HYBRID_20M
    results: []
---

# HYBRID_20M

HYBRID_20M is a model developed for molecular generation tasks, incorporating both **Mamba** and **Attention** layers to utilize the advantages of each architecture. **The training code is available at [https://github.com/Anri-Lombard/Mamba-SAFE](https://github.com/Anri-Lombard/Mamba-SAFE).** The model was trained from scratch on the [MOSES](https://huggingface.co/datasets/katielink/moses) dataset, which has been converted from SMILES to the SAFE (SMILES Augmented For Encoding) format to improve molecular representation for machine learning applications. HYBRID_20M exhibits performance comparable to both transformer-based models such as [SAFE_20M](https://huggingface.co/anrilombard/safe-20m) and mamba-based models like [SSM_20M](https://huggingface.co/anrilombard/ssm-20m).

## Evaluation Results

HYBRID_20M demonstrates performance that is on par with both transformer-based and mamba-based models in molecular generation tasks. The model ensures high validity and diversity in the generated molecular structures, indicating the effectiveness of combining Mamba's sequence modeling with Attention mechanisms.

## Model Description

HYBRID_20M employs a hybrid architecture that integrates the **Mamba** framework with **Attention** layers. This integration allows the model to benefit from Mamba's efficient sequence modeling capabilities and the contextual understanding provided by Attention mechanisms.

### Mamba Framework

The Mamba framework, utilized in HYBRID_20M, was introduced in the following publication:

```bibtex
@article{gu2023mamba,
  title={Mamba: Linear-time sequence modeling with selective state spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}
```

We acknowledge the authors for their contributions to sequence modeling.

### Attention Mechanisms

Attention layers enhance the model's ability to focus on relevant parts of the input sequence, facilitating the capture of long-range dependencies and contextual information. This capability is essential for accurately generating complex molecular structures.

### SAFE Framework

The SAFE framework, also employed in HYBRID_20M, was introduced in the following publication:

```bibtex
@article{noutahi2024gotta,
  title={Gotta be SAFE: a new framework for molecular design},
  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
  journal={Digital Discovery},
  volume={3},
  number={4},
  pages={796--804},
  year={2024},
  publisher={Royal Society of Chemistry}
}
```

We acknowledge the authors for their contributions to molecular design.

## Intended Uses & Limitations

### Intended Uses

HYBRID_20M is intended for:

- **Generating Molecular Structures:** Creating novel molecules with desired properties.
- **Exploring Chemical Space:** Investigating the vast array of possible chemical compounds for research and development.
- **Assisting in Material Design:** Facilitating the creation of new materials with specific functionalities.

### Limitations

- **Validation Required:** Outputs should be validated by domain experts before practical application.
- **Synthetic Feasibility:** Generated molecules may not always be synthetically feasible.
- **Dataset Scope:** The model's knowledge is limited to the chemical space represented in the MOSES dataset.

## Training and Evaluation Data

The model was trained on the [MOSES (MOlecular SEtS)](https://huggingface.co/datasets/katielink/moses) dataset, a benchmark dataset for molecular generation. The dataset was converted from SMILES to the SAFE format to enhance molecular representation for machine learning tasks.

## Training Procedure

### Training Hyperparameters

The following hyperparameters were used during training:

- **Learning Rate:** 0.0005
- **Training Batch Size:** 32
- **Evaluation Batch Size:** 32
- **Seed:** 42
- **Gradient Accumulation Steps:** 2
- **Total Training Batch Size:** 64
- **Optimizer:** Adam (betas=(0.9, 0.999), epsilon=1e-08)
- **Learning Rate Scheduler:** Linear with 20,000 warmup steps
- **Number of Epochs:** 10

### Framework Versions

- **Mamba:** [Specify version]
- **PyTorch:** [Specify version]
- **Datasets:** 2.20.0
- **Tokenizers:** 0.19.1

## Acknowledgements

We acknowledge the authors of the [Mamba](https://github.com/Anri-Lombard/Mamba-SAFE) and SAFE frameworks for their contributions to sequence modeling and molecular design.

## References

```bibtex
@article{gu2023mamba,
  title={Mamba: Linear-time sequence modeling with selective state spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}

@article{noutahi2024gotta,
  title={Gotta be SAFE: a new framework for molecular design},
  author={Noutahi, Emmanuel and Gabellini, Cristian and Craig, Michael and Lim, Jonathan SC and Tossou, Prudencio},
  journal={Digital Discovery},
  volume={3},
  number={4},
  pages={796--804},
  year={2024},
  publisher={Royal Society of Chemistry}
}
```