apcl
/

jam / README.md
mcmillco's picture
Update README.md
e5985ff
metadata
license: bigscience-openrail-m
datasets:
  - apcl/jm52m

Jam

Jam is a GPT2-like model for research in fine-grained Java analysis. It is intended for fine-grained analysis of Java source code at the level of methods, statements, and variables, as a foundation for downstream tasks like code completion, comment generation, and automated bug repair.


Jam Training Details

  • We trained the jam model using the training procedures from Daniel Grittner's NanoGPT-LoRA

  • The dataset used to train our model is our own dataset jm52m dataset, which consists of the processed source code of 52 million Java methods.

  • We train the model on training set for 1 epoch, roughly 300,000 training iterations.

  • Our GitHub repo contains the code for re-training using the raw data

Hyperparameter Description Value
e embedding dimensions 1024
L number of layers 24
h attention heads 16
c block size / context length 256
b batch size 4
a accumulation steps 32
d dropout 0.20
r learning rate 3e-5
y weight decay 1e-1

We train our models using a single NVidia A5000 GPU.


Jam Projects

Current projects using the JAM pre-trained model can be found at our Github repository:

https://github.com/apcl-research/jam