Checkpoints

#15

by borgr - opened Feb 29

Feb 29

There are multiple checkpoints mentioned all inside OLMo-7B repo, how could one part be with LR going to 0 and a later one in the same repo not? What does it mean about the rest of the checkpoints found in the repo?

akshitab

Ai2 org Mar 4

Hi @borgr , for the revisions from step0 to step556, we follow a linear LR schedule, and then in the last 1000 steps, we anneal the LR to 0. We found this to be better for the performance of the final model.

borgr

Mar 4

I think I didn't put the question well

I find the differences between those checkpoints unclear, specifically the ones that are part of allenai/OLMo-7B, how can the not annealed one be the one with more tokens,batches and steps?

akshitab

Ai2 org Mar 7

@borgr This might make it clearer:

borgr

Mar 8

Maybe write in the NAME and Note something comparable between the second and third row then?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment