nferruz
/

ProtGPT2

@@ -35,7 +35,7 @@ Example 1: Generating de novo proteins in a zero-shot fashion. We recommend the
 {'generated_text': 'M\nRRAVGNADLGMEAARYEPSGAYQASEGDGAHGKPHSLPFVALERWQQLGPEERTLAEAVR\nAVLASGQYLLGEAVRRFETAVAAWLGVPFALGVASGTAALTLALRAYGVGPGDEVIVPAI\nTFIATSNAITAAGARPVLVDIDPSTWNMSVASLAARLTPKTKAILAVHLWGQPVDMHPLL\nDIAAQANLAVIEDCAQALGASIAGTKVGTFGDAAAFSFYPTKNMTTGEGGMLVTNARDLA\nQAARMLRSHGQDPPTAYMHSQVGFN'}
 ```
-Example 2: Finetuning on a set of user-defined sequences. This example finetunes using a user-defined training and validation files that contain a set of sequences of interest. The create the validation and training file, it is necessary to (1) substitute the FASTA headers for each sequence with the tag "<|endoftag|>" and (2) split the originating dataset into training and validation files, (this is often done with the ratio 90/10, 80/20 or 95/5). Here we show a learning rate of 1e-06, but ideally should be optimized in separate runs. After training, the finetuned model will be stored in the ./output folder. This model can be used as in the example above to generate tailored sequences.
 The HuggingFace script can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py

 {'generated_text': 'M\nRRAVGNADLGMEAARYEPSGAYQASEGDGAHGKPHSLPFVALERWQQLGPEERTLAEAVR\nAVLASGQYLLGEAVRRFETAVAAWLGVPFALGVASGTAALTLALRAYGVGPGDEVIVPAI\nTFIATSNAITAAGARPVLVDIDPSTWNMSVASLAARLTPKTKAILAVHLWGQPVDMHPLL\nDIAAQANLAVIEDCAQALGASIAGTKVGTFGDAAAFSFYPTKNMTTGEGGMLVTNARDLA\nQAARMLRSHGQDPPTAYMHSQVGFN'}
 ```
+Example 2: Finetuning on a set of user-defined sequences. This example finetunes using a user-defined training and validation files that contain a set of sequences of interest. The create the validation and training file, it is necessary to (1) substitute the FASTA headers for each sequence with the tag "<|endoftext|>" and (2) split the originating dataset into training and validation files, (this is often done with the ratio 90/10, 80/20 or 95/5). Here we show a learning rate of 1e-06, but ideally should be optimized in separate runs. After training, the finetuned model will be stored in the ./output folder. This model can be used as in the example above to generate tailored sequences.
 The HuggingFace script can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py