LLM Foundry Updates 06-01-2023
#47
by
abhi-mosaic
- opened
This PR adds updates from the LLM Foundry repo as of 06/01/2023.
These include:
device_map
support for multiple GPUs- faster inference thanks to a refactor of the KV cacheing
- bugfix for returning the last hidden_state
- support for
output_attentions
when usingattn_impl: torch
- a
requirements.txt
file to make it easier to know what you need to install for MPT - updated README instructions for fast GPU initialization
abhi-mosaic
changed pull request status to
open
Confirming that this seems to play nicely with load_in_8bit=True
on google colab with higher system RAM (>13GB) than the standard tier
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'mosaicml/mpt-7b'
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto",
trust_remote_code=True,
revision="pr/47"
)
https://colab.research.google.com/drive/1-1n2UvrU47UOcWGlgeIuhi2Vi0u7OW5F?usp=sharing
abhi-mosaic
changed pull request status to
merged