jordimas
/

bloom-ctranslate2

Model card Files Files and versions Community

bloom-ctranslate2 / README.md

jordimas's picture

Update link to dirs

ff39182 about 1 year ago

|

1.95 kB

	---
	license: bigscience-bloom-rail-1.0
	---

	# Bloom CTranslate2's model

	This is a collection of some of the [Bigscience Bloom](https://huggingface.co/bigscience/bloom) exported to
	[CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. This allows to load and usage these models
	efficently on CPU or GPU.

	## Models

	The models have been converted to float16 and can be load in with any other quantification method (e.g. int 8).


	\| Model name \| Description \|
	\| --- \| --- \|
	\| [bloom-560m](https://huggingface.co/bigscience/bloom-560m) \| 560M parameter model pretrained on ROOTS\|
	\| [bloom-3b](https://huggingface.co/bigscience/bloom-3b) \| 3B parameter model pretrained on ROOTS
	\| [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) \| 7.1B parameter model finetuned on xP3\|
	\| [bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) \| 7.1B parameter model finetuned on xP3mt \|
	\| [mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt) \| 13B parameter model finetuned on xP3\|

	See [directories](./) for the different models available.

	## Simple code to use them

	Install dependencies:

	```shell
	pip install huggingface_hub ctranslate2 transformers torch
	```

	Usage:

	```python
	model_name = "bloomz-7b1"
	prompt = "Hello, I am Joan and I am from Barcelona and"

	repo_id = "jordimas/bloom-ctranslate2"
	output_dir = "output/"

	kwargs = {
	"local_dir" : output_dir,
	"local_dir_use_symlinks" : False,
	}
	huggingface_hub.snapshot_download(repo_id = repo_id, allow_patterns=f"{model_name}", **kwargs)

	model = f"{output_dir}{model_name}"
	print(f"model: {model}")
	generator = ctranslate2.Generator(model, compute_type="int8")
	tokenizer = transformers.AutoTokenizer.from_pretrained(model)

	start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
	results = generator.generate_batch([start_tokens], max_length=90)
	result = tokenizer.decode(results[0].sequences_ids[0])
	print(f"Result: {result}")
	```