ethzanalytics
/

RedPajama-INCITE-Instruct-7B-v0.1-sharded-bf16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions

pszemraj commited on May 24, 2023

Commit

2385c54

•

1 Parent(s): 3fb5b15

Update README.md

Files changed (1) hide show

README.md +65 -0

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- togethercomputer/RedPajama-Data-1T
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- sharded
+- bf16
+- instruct
 ---
+# togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
+This is the `togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1` model but the model file(s) were sharded to ~2GB each to ensure it's possible to load on low-RAM runtimes (like Colab).
+Please refer to the [original model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1) for all details/issues w.r.t. to this model. Below as an adapted version of the inference code just as a reference.
+## basic inference
+See the original model card for more options etc.
+install packages
+```bash
+pip install -U transformers accelerate
+```
+inference (this will use a GPU if available):
+```python
+import torch
+import transformers
+from transformers import AutoTokenizer, AutoModelForCausalLM
+MIN_TRANSFORMERS_VERSION = "4.25.1"
+# check transformers version
+assert (
+    transformers.__version__ >= MIN_TRANSFORMERS_VERSION
+), f"Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher."
+model_name = "ethzanalytics/RedPajama-INCITE-Instruct-7B-v0.1-sharded-bf16"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name, torch_dtype=torch.bfloat16, device_map="auto"
+)
+# infer
+prompt = "Q: The capital of France is?\nA:"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+input_length = inputs.input_ids.shape[1]
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=128,
+    do_sample=True,
+    temperature=0.7,
+    top_p=0.7,
+    top_k=50,
+    return_dict_in_generate=True,
+)
+token = outputs.sequences[0, input_length:]
+output_str = tokenizer.decode(token)
+print(output_str)
+"""
+Paris
+"""
+```