Cedille AI
Cedille is a project to bring large language models to non-English languages.
de-anna
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the mesh-transformer-jax codebase.
Anna was trained on German text with a similar methodology to Boris, our French model. We started training from GPT-J, which has been trained on The Pile. As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.
How to run
Loading the model
Base (requires 48+ GB of RAM)
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
Lower memory usage
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for GPT-J models in float32 precision. The first trick would be to load the model with the specific argument below to load only one copy of the weights.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.
Generation example
model.eval()
input_sentence = "Wo hast du unsere Sprache gelernt?"
input_ids = tokenizer.encode(input_sentence, return_tensors='pt')
beam_outputs = model.generate(
input_ids,
max_length=100,
do_sample=True,
top_k=50,
top_p=0.95,
num_return_sequences=1
)
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))
Contact us
For any custom development please contact us at [email protected].
Links
- Downloads last month
- 55