mGPT: fine-tune on message data MWE
This model is a fine-tuned version of sberbank-ai/mGPT on 80k messages. Trained for one epoch, will be updated in a (separate) model repo later.
Model description
- testing if fine-tuned personality data bleeds over to other languages without being trained in them explicitly
Usage in python
Install the transformers library if you don't have it:
pip install -U transformers
load the model into a pipeline object:
from transformers import pipeline
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
my_chatbot = pipeline('text-generation',
'pszemraj/mGPT-Peter-mwe',
device=0 if device == 'cuda' else -1,
)
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Framework versions
- Transformers 4.18.0
- Pytorch 1.11.0+cu113
- Datasets 2.1.0
- Tokenizers 0.12.1
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for pszemraj/mGPT-Peter-mwe
Base model
ai-forever/mGPT