Cedille AI

Cedille is a project to bring large language models to non-English languages.

de-anna

Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the mesh-transformer-jax codebase.

Anna was trained on German text with a similar methodology to Boris, our French model. We started training from GPT-J, which has been trained on The Pile. As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.

How to run

Loading the model

Base (requires 48+ GB of RAM)

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")

Lower memory usage

Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for GPT-J models in float32 precision. The first trick would be to load the model with the specific argument below to load only one copy of the weights.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)

We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.

Generation example

model.eval()
input_sentence = "Wo hast du unsere Sprache gelernt?"
input_ids = tokenizer.encode(input_sentence, return_tensors='pt')

beam_outputs = model.generate(
    input_ids, 
    max_length=100, 
    do_sample=True,   
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=1
)
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))

Contact us

For any custom development please contact us at [email protected].

Cedille
/

de-anna