license: bigscience-openrail-m
datasets:
- apcl/jm52m
Jam
Jam is a GPT2-like model for research in fine-grained Java analysis. It is intended for fine-grained analysis of Java source code at the level of methods, statements, and variables, as a foundation for downstream tasks like code completion, comment generation, and automated bug repair.
Jam Training Details
We trained the jam model using the training procedures from Daniel Grittner's NanoGPT-LoRA
The dataset used to train our model is our own dataset jm52m dataset, which consists of the processed source code of 52 million Java methods.
We train the model on training set for 1 epoch, roughly 300,000 training iterations.
Our GitHub repo contains the code for re-training using the raw data
Hyperparameter | Description | Value |
---|---|---|
e | embedding dimensions | 1024 |
L | number of layers | 24 |
h | attention heads | 16 |
c | block size / context length | 256 |
b | batch size | 4 |
a | accumulation steps | 32 |
d | dropout | 0.20 |
r | learning rate | 3e-5 |
y | weight decay | 1e-1 |
We train our models using a single NVidia A5000 GPU.
Jam Projects
Current projects using the JAM pre-trained model can be found at our Github repository: