lhallee commited on
Commit
7e3be76
1 Parent(s): 9b38d4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -4
README.md CHANGED
@@ -18,7 +18,7 @@ widget:
18
 
19
  ## Model description
20
 
21
- cdsBERT is pLM with a codon vocabulary that was seeded with [ProtBERT](https://huggingface.co/Rostlab/prot_bert_bfd) and trained with a novel vocabulary extension pipeline called MELD. cdsBERT offers a highly biologically relevant latent space with excellent EC number prediction surpassing ProtBERT.
22
 
23
  ## How to use
24
 
@@ -46,10 +46,18 @@ vector_embedding = matrix_embedding.mean(dim=0)
46
  ```
47
 
48
  ## Intended use and limitations
49
- cdsBERT serves as a general purpose
50
 
51
  ## Our lab
52
- The [Gleghorn lab](https://www.gleghornlab.com/) is an interdiciplinary research group at the University of Delaware that focuses on solving translational problems with our expertise in engineering, biology, and chemistry. We develop inexpensive and reliable tools to study organ development, maternal-fetal health, and drug delivery. Recently we have begun exploration into protein language models and strive to make protein design and annotation accessible.
53
 
54
  ## Please cite
55
- Coming soon!
 
 
 
 
 
 
 
 
 
18
 
19
  ## Model description
20
 
21
+ [cdsBERT](https://doi.org/10.1101/2023.09.15.558027) is pLM with a codon vocabulary that was seeded with [ProtBERT](https://huggingface.co/Rostlab/prot_bert_bfd) and trained with a novel vocabulary extension pipeline called MELD. cdsBERT offers a highly biologically relevant latent space with excellent EC number prediction surpassing ProtBERT.
22
 
23
  ## How to use
24
 
 
46
  ```
47
 
48
  ## Intended use and limitations
49
+ cdsBERT serves as a general-purpose protein language model with a codon vocabulary. Fine-tuning with Huggingface transformers models like BertForSequenceClassification enables downstream classification and regression tasks. Currently, the base capability enables feature extraction and mask filling.
50
 
51
  ## Our lab
52
+ The [Gleghorn lab](https://www.gleghornlab.com/) is an interdisciplinary research group at the University of Delaware that focuses on solving translational problems with our expertise in engineering, biology, and chemistry. We develop inexpensive and reliable tools to study organ development, maternal-fetal health, and drug delivery. Recently we have begun exploration into protein language models and strive to make protein design and annotation accessible.
53
 
54
  ## Please cite
55
+ @article {Hallee_cds_2023,
56
+ author = {Logan Hallee, Nikolaos Rafailidis, and Jason P. Gleghorn},
57
+ title = {cdsBERT - Extending Protein Language Models with Codon Awareness},
58
+ year = {2023},
59
+ doi = {10.1101/2023.09.15.558027},
60
+ publisher = {Cold Spring Harbor Laboratory},
61
+ journal = {bioRxiv}
62
+ }
63
+