Mobius Mega 12B 128K base

Introduction

Mobius is a RWKV v5.2 arch model, a state based RNN+CNN+Transformer Mixed language model pretrained on a certain amount of data. In comparison with the previous released Mobius, the improvements include:

Only 24G Vram to run this model locally with fp16;
Significant performance improvement in chat model;
Multilingual support ;
Stable support of 128K context length.
chat model Mobius-12B-128k-chat

Usage

We do not advise you to use this model, please use Chat model instead, this model intend to be trained with SFT and instruction tuning.

More details

Mobius 12B 128k based on RWKV v5.2 arch, which is leading state based RNN+CNN+Transformer Mixed language large language model which focus opensouce community

10~100 trainning/inference cost reduce;
state based,selected memory, which mean good at grok;
community support.

requirements

24G vram to run fp16, 12G for int8, 6G for nf4 with Ai00 server.

Trainning details

pretrained with 100B high quality datasets.

future plan

If you need a HF version let us know

Mobius-Chat-12B-128k