Quantizations of https://huggingface.co/allenai/OLMo-1.7-7B-hf

From original readme

Uses

Inference

Install Transformers from source, or update to the next version when this PR is integrated.

Now, proceed as usual with HuggingFace:

from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")
message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
# optional verifying cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> 'Language modeling is the first step to build natural language generation...'

Alternatively, with the pipeline abstraction:

from transformers import pipeline
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-1.7-7B-hf")
print(olmo_pipe("Language modeling is "))
>> 'Language modeling is a branch of natural language processing that aims to...'

Or, you can make this slightly faster by quantizing the model, e.g. AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf", torch_dtype=torch.float16, load_in_8bit=True) (requires bitsandbytes). The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as inputs.input_ids.to('cuda') to avoid potential issues.

Note, you may see the following error if ai2-olmo is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.

    raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: hf_olmo. Run `pip install hf_olmo`

Fine-tuning

Model fine-tuning can be done from the final checkpoint (the main revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.

Fine-tune with the OLMo repository:

torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
    --data.paths=[{path_to_data}/input_ids.npy] \
    --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
    --load_path={path_to_checkpoint} \
    --reset_trainer_state

For more documentation, see the GitHub readme.