--- language: - en tags: - biology - esm - protein extra_gated_heading: Agree to license and to share information extra_gated_description: "The information you provide will be collected, stored, processed and shared in accordance with the [EvolutionaryScale Privacy Policy](https://www.evolutionaryscale.ai/legal/privacy)" extra_gated_prompt: >- [EvolutionaryScale Community License Agreement](https://www.evolutionaryscale.ai/legal/community-license-agreement) **The Big Picture:** 1. The EvolutionaryScale AI Model is **only** available under this Community License Agreement for **non-commercial use** by **individuals** or **non-commercial organizations**. 2. You **may not** use the EvolutionaryScale AI Model or any derivative works of the EvolutionaryScale AI Model or its outputs: a. in connection with **any commercial activities**, for example b. to develop **any product or service** such as hosting the AI Model behind an API; or c. in connection to **drug development**; or d. without attribution to EvolutionaryScale and this Community License Agreement; or e. to **train** any other **large language model**, any technology for protein representation learning or protein generation or any other AI-powered third party model **similar to EvolutionaryScale’s AI Model**, even for non-commercial usage. 3. You **can publish, share and adapt** the EvolutionaryScale AI Model and its outputs for **non-commercial purposes** in accordance with the Community License Agreement extra_gated_fields: Name: text Country: country Role: text Affiliation (Lab/group/division and Institution): text Please describe your intended use of ESM3: text Is your background primarily computational or experimental: type: select options: - Computational - Experimental - Mixed How did you learn about ESM3?: type: select options: - Social media post - News article - Paper pre-print - Word of mouth - Other I accept the CLA: checkbox I agree to use this dataset for non-commercial use ONLY: checkbox --- # Model Card for esm3-sm-open-v1 `esm3-sm-open-v1` is trained on 2.78 billion natural proteins. With synthetic data augmentation, this led to 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations, totaling 771 billion tokens. `esm3-sm-open-v1` is a generative model capable of designing proteins conditioned on partial prompts of sequence, structure and function. Safety is an important part of our model - data related to viruses has been removed from the training dataset, as well as some proteins belonging to organisms on the [USDA Select Agents and Toxins](https://www.selectagents.gov/sat/list.htm) list. The function decoder has been filtered for potentially harmful keywords. ## Usage Using `ESM3` requires [esm](https://github.com/evolutionaryscale/esm) ``` pip install esm ``` Please refer to the readme and notebooks in the [esm repository](https://github.com/evolutionaryscale/esm?tab=readme-ov-file#quickstart) for details on how to use the model. ## License This repository is under a custom non-commercial [license](https://github.com/evolutionaryscale/esm/blob/main/LICENSE.md).