Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
- plants
|
12 |
---
|
13 |
## Model Overview
|
14 |
-
|
15 |
objective to leverage highly available genotype data from 48 different plant speices to learn general representations of nucleotide sequences. AgroNT contains 1 billion parameters and has a context window of 1024 tokens.
|
16 |
AgroNt uses a non-overlapping 6-mer tokenizer to convert genomic nucletoide sequences to tokens. As a result the 1024 tokens correspond to approximately 6144 base pairs.
|
17 |
|
@@ -22,7 +22,7 @@ from transformers import AutoModelForMaskedLM, AutoTokenizer
|
|
22 |
import torch
|
23 |
|
24 |
|
25 |
-
model_name = 'agro-
|
26 |
|
27 |
# fetch model and tokenizer from InstaDeep's hf repo
|
28 |
agro_nt_model = AutoModelForMaskedLM.from_pretrained(f'InstaDeepAI/{model_name}')
|
|
|
11 |
- plants
|
12 |
---
|
13 |
## Model Overview
|
14 |
+
AgroNT is a DNA language model trained on primarily edible plant genomes. More specifically, AgroNT uses the transformer architecture with self-attention and a masked language modeling
|
15 |
objective to leverage highly available genotype data from 48 different plant speices to learn general representations of nucleotide sequences. AgroNT contains 1 billion parameters and has a context window of 1024 tokens.
|
16 |
AgroNt uses a non-overlapping 6-mer tokenizer to convert genomic nucletoide sequences to tokens. As a result the 1024 tokens correspond to approximately 6144 base pairs.
|
17 |
|
|
|
22 |
import torch
|
23 |
|
24 |
|
25 |
+
model_name = 'agro-nucleotide-transformer-1b'
|
26 |
|
27 |
# fetch model and tokenizer from InstaDeep's hf repo
|
28 |
agro_nt_model = AutoModelForMaskedLM.from_pretrained(f'InstaDeepAI/{model_name}')
|