tangled-llama-58m-32k-base-v0.1

A pretrained language model based on the Llama model with about 58M parameters. This model has been trained on 11.4B (11,422,750,857) tokens from more than 0.8M (796,399) dataset rows.

This model isn't designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to 128K (131,072) tokens, it was pretrained with sequences of 2K (2048) tokens.

The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.

tangledgroup
/

tangled-llama-u-128k-base-v0.1

tangled-llama-58m-32k-base-v0.1

Datasets used to train tangledgroup/tangled-llama-u-128k-base-v0.1