LLM Foundry Updates 06-01-2023

#47

This PR adds updates from the LLM Foundry repo as of 06/01/2023.

These include:

  • device_map support for multiple GPUs
  • faster inference thanks to a refactor of the KV cacheing
  • bugfix for returning the last hidden_state
  • support for output_attentions when using attn_impl: torch
  • a requirements.txt file to make it easier to know what you need to install for MPT
  • updated README instructions for fast GPU initialization
abhi-mosaic changed pull request status to open

Confirming that this seems to play nicely with load_in_8bit=True on google colab with higher system RAM (>13GB) than the standard tier

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'mosaicml/mpt-7b'

model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    load_in_8bit=True,
    device_map="auto",
    trust_remote_code=True,
    revision="pr/47"
)

https://colab.research.google.com/drive/1-1n2UvrU47UOcWGlgeIuhi2Vi0u7OW5F?usp=sharing

abhi-mosaic changed pull request status to merged

Sign up or log in to comment