Edit model card

BEE-spoke-data/Jamba-900M-doc-writer

to test it out, try this notebook

This model produces long, surprisingly coherent output that extends some input text; you can see an example here, which is a generated textbook about underwater city design.

image/png

Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens.

Model description

This model is a fine-tuned version of pszemraj/jamba-900M-v0.13-KIx2 on some textbook data.

It achieves the following results on the evaluation set:

  • Loss: 3.0200
  • Accuracy: 0.4544
  • Num Input Tokens Seen: 4940890112

Intended Uses & Limitations

  • Long context generation
  • It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text
  • this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024)

Downloads last month
4
Safetensors
Model size
888M params
Tensor type
F32
·
BF16
·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for BEE-spoke-data/Jamba-900M-doc-writer

Finetuned
this model