psinger commited on
Commit
56a3404
1 Parent(s): 4cf9723

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -28
README.md CHANGED
@@ -8,13 +8,18 @@ tags:
8
  - large language model
9
  - h2o-llmstudio
10
  inference: false
11
- thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
 
 
 
 
12
  ---
13
  # Model Card
14
  ## Summary
15
 
16
  This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio).
17
  - Base model: [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
 
18
 
19
 
20
  ## Usage
@@ -22,21 +27,42 @@ This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmst
22
  To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
23
 
24
  ```bash
25
- pip install transformers==4.29.0
26
- pip install accelerate==0.20.3
 
27
  pip install torch==2.0.0
 
28
  ```
29
 
30
  ```python
31
  import torch
32
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  generate_text = pipeline(
35
  model="psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
36
- torch_dtype="auto",
 
37
  trust_remote_code=True,
38
  use_fast=False,
39
  device_map={"": "cuda:0"},
 
40
  )
41
 
42
  res = generate_text(
@@ -62,12 +88,19 @@ print(generate_text.preprocess("Why is drinking water so healthy?")["prompt_text
62
  <|prompt|>Why is drinking water so healthy?<|endoftext|><|answer|>
63
  ```
64
 
65
- Alternatively, you can download [h2oai_pipeline.py](h2oai_pipeline.py), store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer. If the model and the tokenizer are fully supported in the `transformers` package, this will allow you to set `trust_remote_code=False`.
66
 
67
  ```python
68
  import torch
69
  from h2oai_pipeline import H2OTextGenerationPipeline
70
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
71
 
72
  tokenizer = AutoTokenizer.from_pretrained(
73
  "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
@@ -77,10 +110,11 @@ tokenizer = AutoTokenizer.from_pretrained(
77
  )
78
  model = AutoModelForCausalLM.from_pretrained(
79
  "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
80
- torch_dtype="auto",
81
- device_map={"": "cuda:0"},
82
  trust_remote_code=True,
83
- )
 
 
 
84
  generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)
85
 
86
  res = generate_text(
@@ -100,25 +134,33 @@ print(res[0]["generated_text"])
100
  You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
101
 
102
  ```python
103
- from transformers import AutoModelForCausalLM, AutoTokenizer
104
 
105
- model_name = "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2" # either local folder or huggingface model name
106
  # Important: The prompt needs to be in the same format the model was trained with.
107
  # You can find an example prompt in the experiment logs.
108
  prompt = "<|prompt|>How are you?<|endoftext|><|answer|>"
109
 
 
 
 
 
 
 
 
110
  tokenizer = AutoTokenizer.from_pretrained(
111
- model_name,
112
  use_fast=False,
 
113
  trust_remote_code=True,
114
  )
115
  model = AutoModelForCausalLM.from_pretrained(
116
- model_name,
117
- torch_dtype="auto",
118
- device_map={"": "cuda:0"},
119
  trust_remote_code=True,
120
- )
121
- model.cuda().eval()
 
 
 
122
  inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
123
 
124
  # generate configuration can be modified to your needs
@@ -171,16 +213,6 @@ RWForCausalLM(
171
 
172
  This model was trained using H2O LLM Studio and with the configuration in [cfg.yaml](cfg.yaml). Visit [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio) to learn how to train your own large language models.
173
 
174
-
175
- ## Model Validation
176
-
177
- Model validation results using [EleutherAI lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
178
-
179
- ```bash
180
- CUDA_VISIBLE_DEVICES=0 python main.py --model hf-causal-experimental --model_args pretrained=psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2 --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> eval.log
181
- ```
182
-
183
-
184
  ## Disclaimer
185
 
186
  Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.
 
8
  - large language model
9
  - h2o-llmstudio
10
  inference: false
11
+ thumbnail: >-
12
+ https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
13
+ license: apache-2.0
14
+ datasets:
15
+ - OpenAssistant/oasst1
16
  ---
17
  # Model Card
18
  ## Summary
19
 
20
  This model was trained using [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio).
21
  - Base model: [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
22
+ - Dataset preparation: [OpenAssistant/oasst1](https://github.com/h2oai/h2o-llmstudio/blob/1935d84d9caafed3ee686ad2733eb02d2abfce57/app_utils/utils.py#LL1896C5-L1896C28)
23
 
24
 
25
  ## Usage
 
27
  To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers`, `accelerate` and `torch` libraries installed.
28
 
29
  ```bash
30
+ pip install transformers==4.29.2
31
+ pip install bitsandbytes==0.39.0
32
+ pip install accelerate==0.19.0
33
  pip install torch==2.0.0
34
+ pip install einops==0.6.1
35
  ```
36
 
37
  ```python
38
  import torch
39
+ from transformers import pipeline, BitsAndBytesConfig, AutoTokenizer
40
+
41
+ model_kwargs = {}
42
+
43
+ quantization_config = None
44
+ # optional quantization
45
+ quantization_config = BitsAndBytesConfig(
46
+ load_in_8bit=True,
47
+ llm_int8_threshold=6.0,
48
+ )
49
+ model_kwargs["quantization_config"] = quantization_config
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(
52
+ "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
53
+ use_fast=False,
54
+ padding_side="left",
55
+ trust_remote_code=True,
56
+ )
57
 
58
  generate_text = pipeline(
59
  model="psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
60
+ tokenizer=tokenizer,
61
+ torch_dtype=torch.float16,
62
  trust_remote_code=True,
63
  use_fast=False,
64
  device_map={"": "cuda:0"},
65
+ model_kwargs=model_kwargs,
66
  )
67
 
68
  res = generate_text(
 
88
  <|prompt|>Why is drinking water so healthy?<|endoftext|><|answer|>
89
  ```
90
 
91
+ Alternatively, you can download [h2oai_pipeline.py](h2oai_pipeline.py), store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
92
 
93
  ```python
94
  import torch
95
  from h2oai_pipeline import H2OTextGenerationPipeline
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
97
+
98
+ quantization_config = None
99
+ # optional quantization
100
+ quantization_config = BitsAndBytesConfig(
101
+ load_in_8bit=True,
102
+ llm_int8_threshold=6.0,
103
+ )
104
 
105
  tokenizer = AutoTokenizer.from_pretrained(
106
  "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
 
110
  )
111
  model = AutoModelForCausalLM.from_pretrained(
112
  "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
 
 
113
  trust_remote_code=True,
114
+ torch_dtype=torch.float16,
115
+ device_map={"": "cuda:0"},
116
+ quantization_config=quantization_config
117
+ ).eval()
118
  generate_text = H2OTextGenerationPipeline(model=model, tokenizer=tokenizer)
119
 
120
  res = generate_text(
 
134
  You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:
135
 
136
  ```python
137
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
138
 
 
139
  # Important: The prompt needs to be in the same format the model was trained with.
140
  # You can find an example prompt in the experiment logs.
141
  prompt = "<|prompt|>How are you?<|endoftext|><|answer|>"
142
 
143
+ quantization_config = None
144
+ # optional quantization
145
+ quantization_config = BitsAndBytesConfig(
146
+ load_in_8bit=True,
147
+ llm_int8_threshold=6.0,
148
+ )
149
+
150
  tokenizer = AutoTokenizer.from_pretrained(
151
+ "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
152
  use_fast=False,
153
+ padding_side="left",
154
  trust_remote_code=True,
155
  )
156
  model = AutoModelForCausalLM.from_pretrained(
157
+ "psinger/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2",
 
 
158
  trust_remote_code=True,
159
+ torch_dtype=torch.float16,
160
+ device_map={"": "cuda:0"},
161
+ quantization_config=quantization_config
162
+ ).eval()
163
+
164
  inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
165
 
166
  # generate configuration can be modified to your needs
 
213
 
214
  This model was trained using H2O LLM Studio and with the configuration in [cfg.yaml](cfg.yaml). Visit [H2O LLM Studio](https://github.com/h2oai/h2o-llmstudio) to learn how to train your own large language models.
215
 
 
 
 
 
 
 
 
 
 
 
216
  ## Disclaimer
217
 
218
  Please read this disclaimer carefully before using the large language model provided in this repository. Your use of the model signifies your agreement to the following terms and conditions.