Noelia Ferruz
commited on
Commit
•
ebeb946
1
Parent(s):
559ef80
Fixed typo endoftag - > endoftext
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ Example 1: Generating de novo proteins in a zero-shot fashion. We recommend the
|
|
35 |
{'generated_text': 'M\nRRAVGNADLGMEAARYEPSGAYQASEGDGAHGKPHSLPFVALERWQQLGPEERTLAEAVR\nAVLASGQYLLGEAVRRFETAVAAWLGVPFALGVASGTAALTLALRAYGVGPGDEVIVPAI\nTFIATSNAITAAGARPVLVDIDPSTWNMSVASLAARLTPKTKAILAVHLWGQPVDMHPLL\nDIAAQANLAVIEDCAQALGASIAGTKVGTFGDAAAFSFYPTKNMTTGEGGMLVTNARDLA\nQAARMLRSHGQDPPTAYMHSQVGFN'}
|
36 |
```
|
37 |
|
38 |
-
Example 2: Finetuning on a set of user-defined sequences. This example finetunes using a user-defined training and validation files that contain a set of sequences of interest. The create the validation and training file, it is necessary to (1) substitute the FASTA headers for each sequence with the tag "<|
|
39 |
|
40 |
The HuggingFace script can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py
|
41 |
|
|
|
35 |
{'generated_text': 'M\nRRAVGNADLGMEAARYEPSGAYQASEGDGAHGKPHSLPFVALERWQQLGPEERTLAEAVR\nAVLASGQYLLGEAVRRFETAVAAWLGVPFALGVASGTAALTLALRAYGVGPGDEVIVPAI\nTFIATSNAITAAGARPVLVDIDPSTWNMSVASLAARLTPKTKAILAVHLWGQPVDMHPLL\nDIAAQANLAVIEDCAQALGASIAGTKVGTFGDAAAFSFYPTKNMTTGEGGMLVTNARDLA\nQAARMLRSHGQDPPTAYMHSQVGFN'}
|
36 |
```
|
37 |
|
38 |
+
Example 2: Finetuning on a set of user-defined sequences. This example finetunes using a user-defined training and validation files that contain a set of sequences of interest. The create the validation and training file, it is necessary to (1) substitute the FASTA headers for each sequence with the tag "<|endoftext|>" and (2) split the originating dataset into training and validation files, (this is often done with the ratio 90/10, 80/20 or 95/5). Here we show a learning rate of 1e-06, but ideally should be optimized in separate runs. After training, the finetuned model will be stored in the ./output folder. This model can be used as in the example above to generate tailored sequences.
|
39 |
|
40 |
The HuggingFace script can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py
|
41 |
|