license: mit
datasets:
- jrahn/yolochess_lichess-elite_2211
library_name: transformers
tags:
- chess
widget:
- text: rnbqkbnr/pppppppp/8/8/8/[MASK]/PPPPPPPP/RNBQKBNR w KQkq - 0 1
example_title: 'MLM: Masked = 8'
- text: 6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74
example_title: 'MLM: Masked = K'
Model Card for yolochess_mlm_azure-cloud-35
This model with 66M parameters is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
Model Details
Model Description
- Developed by: Jonathan Rahn
- Model type: Distilbert
- Language(s) (NLP): Chess FEN
- License: MIT
Uses
Direct Use
This model is pre-trained from scratch with Masked Language Modeling on Chess Positions in FEN format.
Downstream Use
It is supposed to be used for downstream fine-tuning, e.g. Text Classification for human moves.
Out-of-Scope Use
Anything other than Chess Positions in standard FEN format.
Bias, Risks, and Limitations
n/a
Recommendations
n/a
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
model = AutoModelForMaskedLM.from_pretrained("jrahn/yolochess_mlm_azure-cloud-35")
from transformers import pipeline
pipe = pipeline("fill-mask", "jrahn/yolochess_mlm_azure-cloud-35")
pipe("6k1/8/8/1pB3[MASK]P/1P3P2/8/8/8 w - - 1 74")
Training Details
Training Data
Training Procedure
Masked Language Modeling objective with 15% masked token ratio.
Preprocessing
Tokenize data["train"]["fen"]
with max-length padding to 200 tokens with default distilbert-base-cased
tokenizer.
Inefficient: Most of the vocab is never observed in FEN, wasting embedding parameters.
The sequence length / pos embedding size of model and sequence length of data preprocessing leads to lots of padding and wasted parameters. FENs should be shorter than 90 characters.
Experiments with reduced max-length in tokenization show performance gains.
Speeds, Sizes, Times
Training for 172500 steps at batch-size 128 (22M examples, 1 epoch) took ~10 hrs on 1x RTX 4090, using 20GB VRAM, with final MLM-loss: 0.2567.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 1x RTX 4090
- Hours used: 10
- Cloud Provider: local
- Compute Region: local
- Carbon Emitted: 1.5kg
Technical Specifications
Model Architecture and Objective
Distilbert, Masked Language Modeling