Any details on this and the base model?

#1
by jukofyork - opened

Just noticed this has the same architecture as llama-3.1:70b but with 140 layers instead of 80!?

Can you share any details on what the benefit the extra 60 layers give? I tried looking through your blog and dev-guide, but can't see any details on the model:

https://writer.com/blog/actions-with-palmyra-x-004/
https://dev.writer.com/home/models

Writer org

Palmyra-Creative is a fine-tuned version of Palmyra-X-004, a retrained model by Writer based on the Llama architecture. We’ve built it with greater depth, using only the T1.2 synthetic token dataset.

Palmyra-Creative still Limited access, official release date in few weeks.

jukofyork changed discussion status to closed

Sign up or log in to comment