Bloom CTranslate2's model

This is a collection of some of the Bigscience Bloom exported to CTranslate2 model format. This allows to load and usage these models efficently on CPU or GPU.

Models

The models have been converted to float16 and can be load in with any other quantification method (e.g. int 8).

Model name	Description
bloom-560m	560M parameter model pretrained on ROOTS
bloom-3b	3B parameter model pretrained on ROOTS
bloomz-7b1	7.1B parameter model finetuned on xP3
bloomz-7b1-mt	7.1B parameter model finetuned on xP3mt
mt0-xxl-mt	13B parameter model finetuned on xP3

See directories for the different models available.

Simple code to use them

Install dependencies:

pip install huggingface_hub ctranslate2 transformers torch

Usage:

import huggingface_hub
import ctranslate2
import transformers

model_name = "bloomz-7b1"
prompt = "Hello, I am Joan and I am from Barcelona and"

repo_id = "jordimas/bloom-ctranslate2"

snapshot_folder = huggingface_hub.snapshot_download(repo_id = repo_id, allow_patterns=f"*{model_name}*")
print(f"folder: {snapshot_folder}")

model = f"{snapshot_folder}/{model_name}"
generator = ctranslate2.Generator(model, compute_type="int8")
tokenizer = transformers.AutoTokenizer.from_pretrained(model)

start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
results = generator.generate_batch([start_tokens], max_length=90)
result = tokenizer.decode(results[0].sequences_ids[0])
print(f"Result: {result}")