Edit model card

L-3.1-Science-Writer-8B

Work in progress

This is a model I made by fine-tuning THUDM/LongWriter-llama3.1-8b on the arxiver dataset for 2 epochs. Then merged it with djuna/L3.1-Purosani-2-8B for general smarts and all-rounderishness.

image/png

Chat format

Use the same format as LongWriter:

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]

[/INST]

To make it write a paper (this isn't as good as I expected it to be):

<<SYS>>
You are a research assistant.
<</SYS>>

[INST]
Write a paper with the given details provided by the user. Identify gaps or opportunities for original insights based on the provided abstract. Include clear proofs, calculations, or evidence where required. Maintain an academic tone and ensure consistency.
Topic: {}
Abstract (optional): {}
Include these authors' names: {}.
[/INST]

Benchmarks

T Model Average IFEval BBH MATH Lvl 5 GPQA MUSR MMLU-PRO CO₂ cost (kg)
🔶 3rd-Degree-Burn/L-3.1-Science-Writer-8B 21.08 42.63 29.2 10.27 3.24 11.69 29.44 0.71

Personal thoughts

I used a pretty low rank (r=32). The final loss after 2 epochs was around 0.9, which is okay but not great. I think the deeper layers of the model haven’t been fully saturated yet, so it’s still a bit of a work in progress.

Edit: This model has a repetition problem. I wouldn't recommend using it.

Downloads last month
66
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for 3rd-Degree-Burn/L-3.1-Science-Writer-8B

Dataset used to train 3rd-Degree-Burn/L-3.1-Science-Writer-8B

Collection including 3rd-Degree-Burn/L-3.1-Science-Writer-8B