h2oai
/

h2ogpt-gm-oasst1-en-2048-falcon-40b-v2

@@ -8,13 +8,18 @@ tags:
 - large language model
 - h2o-llmstudio
 inference: false
-thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
 ---
 # Model Card
 ## Summary
 This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio).
 - Base model: [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
 ## Usage
@@ -22,21 +27,42 @@ This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmst
 To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
 ```bash
-pip install transformers==4.29.0
-pip install accelerate==0.20.3
 pip install torch==2.0.0
 ```
 ```python
 import torch
-from transformers import pipeline
 generate_text = pipeline(
     model="psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
-    torch_dtype="auto",
     trust_remote_code=True,
     use_fast=False,
     device_map={"": "cuda:0"},
 )
 res = generate_text(
@@ -62,12 +88,19 @@ print(generate_text.preprocess("Why is drinking water so healthy?")["prompt_text
 <|prompt|>Why is drinking water so healthy?<|endoftext|><|answer|>
 ```
-Alternatively, you can download [h2oai_pipeline.py](h2oai_pipeline.py), store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer. If the model and the tokenizer are fully supported in the `transformers` package, this will allow you to set `trust_remote_code=False`.
 ```python
 import torch
 from h2oai_pipeline import H2OTextGenerationPipeline
-from transformers import AutoModelForCausalLM, AutoTokenizer
 tokenizer = AutoTokenizer.from_pretrained(
     "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
@@ -77,10 +110,11 @@ tokenizer = AutoTokenizer.from_pretrained(
 )
 model = AutoModelForCausalLM.from_pretrained(
     "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
-    torch_dtype="auto",
-    device_map={"": "cuda:0"},
     trust_remote_code=True,
-)
 generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)
 res = generate_text(
@@ -100,25 +134,33 @@ print(res[0]["generated_text"])
 You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2"  # either local folder or huggingface model name
 # Important: The prompt needs to be in the same format the model was trained with.
 # You can find an example prompt in the experiment logs.
 prompt = "<|prompt|>How are you?<|endoftext|><|answer|>"
 tokenizer = AutoTokenizer.from_pretrained(
-    model_name,
     use_fast=False,
     trust_remote_code=True,
 )
 model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype="auto",
-    device_map={"": "cuda:0"},
     trust_remote_code=True,
-)
-model.cuda().eval()
 inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
 # generate configuration can be modified to your needs
@@ -171,16 +213,6 @@ RWForCausalLM(
 This model was trained using H2O LLM Studio and with the configuration in [cfg.yaml](cfg.yaml). Visit [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio) to learn how to train your own large language models.
-## Model Validation
-Model validation results using [EleutherAI lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
-```bash
-CUDA_VISIBLE_DEVICES=0 python main.py --model hf-causal-experimental --model_args pretrained=psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2 --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> eval.log
-```
 ## Disclaimer
 Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.

 - large language model
 - h2o-llmstudio
 inference: false
+thumbnail: >-
+  https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
+license: apache-2.0
+datasets:
+- OpenAssistant/oasst1
 ---
 # Model Card
 ## Summary
 This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio).
 - Base model: [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
+- Dataset preparation: [OpenAssistant/oasst1](https://github.com/h2oai/h2o-llmstudio/blob/1935d84d9caafed3ee686ad2733eb02d2abfce57/app_utils/utils.py#LL1896C5-L1896C28)
 ## Usage
 To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
 ```bash
+pip install transformers==4.29.2
+pip install bitsandbytes==0.39.0
+pip install accelerate==0.19.0
 pip install torch==2.0.0
+pip install einops==0.6.1
 ```
 ```python
 import torch
+from transformers import pipeline, BitsAndBytesConfig, AutoTokenizer
+model_kwargs = {}
+quantization_config = None
+# optional quantization
+quantization_config = BitsAndBytesConfig(
+    load_in_8bit=True,
+    llm_int8_threshold=6.0,
+)
+model_kwargs["quantization_config"] = quantization_config
+tokenizer = AutoTokenizer.from_pretrained(
+    "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
+    use_fast=False,
+    padding_side="left",
+    trust_remote_code=True,
+)
 generate_text = pipeline(
     model="psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
+    tokenizer=tokenizer,
+    torch_dtype=torch.float16,
     trust_remote_code=True,
     use_fast=False,
     device_map={"": "cuda:0"},
+    model_kwargs=model_kwargs,
 )
 res = generate_text(
 <|prompt|>Why is drinking water so healthy?<|endoftext|><|answer|>
 ```
+Alternatively, you can download [h2oai_pipeline.py](h2oai_pipeline.py), store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
 ```python
 import torch
 from h2oai_pipeline import H2OTextGenerationPipeline
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+quantization_config = None
+# optional quantization
+quantization_config = BitsAndBytesConfig(
+    load_in_8bit=True,
+    llm_int8_threshold=6.0,
+)
 tokenizer = AutoTokenizer.from_pretrained(
     "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
 )
 model = AutoModelForCausalLM.from_pretrained(
     "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
     trust_remote_code=True,
+    torch_dtype=torch.float16,
+    device_map={"": "cuda:0"},
+    quantization_config=quantization_config
+).eval()
 generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)
 res = generate_text(
 You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 # Important: The prompt needs to be in the same format the model was trained with.
 # You can find an example prompt in the experiment logs.
 prompt = "<|prompt|>How are you?<|endoftext|><|answer|>"
+quantization_config = None
+# optional quantization
+quantization_config = BitsAndBytesConfig(
+    load_in_8bit=True,
+    llm_int8_threshold=6.0,
+)
 tokenizer = AutoTokenizer.from_pretrained(
+    "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
     use_fast=False,
+    padding_side="left",
     trust_remote_code=True,
 )
 model = AutoModelForCausalLM.from_pretrained(
+    "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
     trust_remote_code=True,
+    torch_dtype=torch.float16,
+    device_map={"": "cuda:0"},
+    quantization_config=quantization_config
+).eval()
 inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
 # generate configuration can be modified to your needs
 This model was trained using H2O LLM Studio and with the configuration in [cfg.yaml](cfg.yaml). Visit [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio) to learn how to train your own large language models.
 ## Disclaimer
 Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.