Anthonyg5005 commited on
Commit
aa3ba77
1 Parent(s): d94ef48

build exllamav2 at beginning

Browse files
ipynb/EXL2_Private_Quant_V1.ipynb CHANGED
@@ -1,33 +1,17 @@
1
  {
2
- "nbformat": 4,
3
- "nbformat_minor": 0,
4
- "metadata": {
5
- "colab": {
6
- "provenance": [],
7
- "gpuType": "T4"
8
- },
9
- "kernelspec": {
10
- "name": "python3",
11
- "display_name": "Python 3"
12
- },
13
- "language_info": {
14
- "name": "python"
15
- },
16
- "accelerator": "GPU"
17
- },
18
  "cells": [
19
  {
20
  "cell_type": "markdown",
 
 
 
21
  "source": [
22
  "#Quantizing huggingface models to exl2\n",
23
  "This version of my exl2 quantize colab creates a single quantizaion to download privatly.\\\n",
24
  "To calculate an estimate for VRAM size use: [NyxKrage/LLM-Model-VRAM-Calculator](https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator)\\\n",
25
  "Not all models and architectures are compatible with exl2.\\\n",
26
  "Will upload to private hf repo in future."
27
- ],
28
- "metadata": {
29
- "id": "Ku0ezvyD42ng"
30
- }
31
  },
32
  {
33
  "cell_type": "code",
@@ -44,12 +28,19 @@
44
  "print(\"Installing pip dependencies\")\n",
45
  "!pip install -q -r requirements.txt\n",
46
  "!pip install -q huggingface_hub requests tqdm\n",
 
47
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
48
  "modeldw = \"none\""
49
  ]
50
  },
51
  {
52
  "cell_type": "code",
 
 
 
 
 
 
53
  "source": [
54
  "#@title Login to HF (Required only for gated models)\n",
55
  "#@markdown From my Colab/Kaggle login script on [Anthonyg5005/hf-scripts](https://huggingface.co/Anthonyg5005/hf-scripts/blob/main/HF%20Login%20Snippet%20Kaggle.py)\n",
@@ -75,16 +66,16 @@
75
  "else:\n",
76
  " #if the token is not found then prompt user to provide it:\n",
77
  " login(input(\"API token not detected. Enter your HuggingFace (WRITE) token: \"))"
78
- ],
79
- "metadata": {
80
- "cellView": "form",
81
- "id": "8Hl3fQmRLybp"
82
- },
83
- "execution_count": null,
84
- "outputs": []
85
  },
86
  {
87
  "cell_type": "code",
 
 
 
 
 
 
88
  "source": [
89
  "#@title ##Choose HF model to download\n",
90
  "#@markdown Weights must be stored in safetensors\n",
@@ -96,19 +87,19 @@
96
  "modeldw = f\"{User}/{Repo}\"\n",
97
  "model = f\"{User}_{Repo}\"\n",
98
  "!python download-model.py {modeldw}"
99
- ],
100
- "metadata": {
101
- "cellView": "form",
102
- "id": "NI1LUMD7H-Zx"
103
- },
104
- "execution_count": null,
105
- "outputs": []
106
  },
107
  {
108
  "cell_type": "code",
 
 
 
 
 
 
109
  "source": [
110
  "#@title Quantize the model\n",
111
- "#@markdown ###Takes ~13 minutes to start quantizing first time, then quantization will last based on model size\n",
112
  "#@markdown Target bits per weight:\n",
113
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
114
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",
@@ -123,16 +114,16 @@
123
  "else:\n",
124
  " quant = f\"convert.py -i models/{model} -o {model}-exl2-{BPW}bpw-WD -cf {model}-exl2-{BPW}bpw -b {BPW}\"\n",
125
  "!python {quant}"
126
- ],
127
- "metadata": {
128
- "id": "8anbEbGyNmBI",
129
- "cellView": "form"
130
- },
131
- "execution_count": null,
132
- "outputs": []
133
  },
134
  {
135
  "cell_type": "code",
 
 
 
 
 
 
136
  "source": [
137
  "#@title Zip and download the model\n",
138
  "!rm -r {model}-exl2-{BPW}bpw-WD\n",
@@ -142,13 +133,23 @@
142
  "from google.colab import files\n",
143
  "files.download(f\"{model}-{BPW}bpw.zip\")\n",
144
  "print(\"Colab download speeds very slow so download will take a while\")"
145
- ],
146
- "metadata": {
147
- "cellView": "form",
148
- "id": "XORLS2uPrbma"
149
- },
150
- "execution_count": null,
151
- "outputs": []
152
  }
153
- ]
154
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "cells": [
3
  {
4
  "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "Ku0ezvyD42ng"
7
+ },
8
  "source": [
9
  "#Quantizing huggingface models to exl2\n",
10
  "This version of my exl2 quantize colab creates a single quantizaion to download privatly.\\\n",
11
  "To calculate an estimate for VRAM size use: [NyxKrage/LLM-Model-VRAM-Calculator](https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator)\\\n",
12
  "Not all models and architectures are compatible with exl2.\\\n",
13
  "Will upload to private hf repo in future."
14
+ ]
 
 
 
15
  },
16
  {
17
  "cell_type": "code",
 
28
  "print(\"Installing pip dependencies\")\n",
29
  "!pip install -q -r requirements.txt\n",
30
  "!pip install -q huggingface_hub requests tqdm\n",
31
+ "!pip install . -q\n",
32
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
33
  "modeldw = \"none\""
34
  ]
35
  },
36
  {
37
  "cell_type": "code",
38
+ "execution_count": null,
39
+ "metadata": {
40
+ "cellView": "form",
41
+ "id": "8Hl3fQmRLybp"
42
+ },
43
+ "outputs": [],
44
  "source": [
45
  "#@title Login to HF (Required only for gated models)\n",
46
  "#@markdown From my Colab/Kaggle login script on [Anthonyg5005/hf-scripts](https://huggingface.co/Anthonyg5005/hf-scripts/blob/main/HF%20Login%20Snippet%20Kaggle.py)\n",
 
66
  "else:\n",
67
  " #if the token is not found then prompt user to provide it:\n",
68
  " login(input(\"API token not detected. Enter your HuggingFace (WRITE) token: \"))"
69
+ ]
 
 
 
 
 
 
70
  },
71
  {
72
  "cell_type": "code",
73
+ "execution_count": null,
74
+ "metadata": {
75
+ "cellView": "form",
76
+ "id": "NI1LUMD7H-Zx"
77
+ },
78
+ "outputs": [],
79
  "source": [
80
  "#@title ##Choose HF model to download\n",
81
  "#@markdown Weights must be stored in safetensors\n",
 
87
  "modeldw = f\"{User}/{Repo}\"\n",
88
  "model = f\"{User}_{Repo}\"\n",
89
  "!python download-model.py {modeldw}"
90
+ ]
 
 
 
 
 
 
91
  },
92
  {
93
  "cell_type": "code",
94
+ "execution_count": null,
95
+ "metadata": {
96
+ "cellView": "form",
97
+ "id": "8anbEbGyNmBI"
98
+ },
99
+ "outputs": [],
100
  "source": [
101
  "#@title Quantize the model\n",
102
+ "#@markdown ###Quantization time will last based on model size\n",
103
  "#@markdown Target bits per weight:\n",
104
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
105
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",
 
114
  "else:\n",
115
  " quant = f\"convert.py -i models/{model} -o {model}-exl2-{BPW}bpw-WD -cf {model}-exl2-{BPW}bpw -b {BPW}\"\n",
116
  "!python {quant}"
117
+ ]
 
 
 
 
 
 
118
  },
119
  {
120
  "cell_type": "code",
121
+ "execution_count": null,
122
+ "metadata": {
123
+ "cellView": "form",
124
+ "id": "XORLS2uPrbma"
125
+ },
126
+ "outputs": [],
127
  "source": [
128
  "#@title Zip and download the model\n",
129
  "!rm -r {model}-exl2-{BPW}bpw-WD\n",
 
133
  "from google.colab import files\n",
134
  "files.download(f\"{model}-{BPW}bpw.zip\")\n",
135
  "print(\"Colab download speeds very slow so download will take a while\")"
136
+ ]
 
 
 
 
 
 
137
  }
138
+ ],
139
+ "metadata": {
140
+ "accelerator": "GPU",
141
+ "colab": {
142
+ "gpuType": "T4",
143
+ "provenance": []
144
+ },
145
+ "kernelspec": {
146
+ "display_name": "Python 3",
147
+ "name": "python3"
148
+ },
149
+ "language_info": {
150
+ "name": "python"
151
+ }
152
+ },
153
+ "nbformat": 4,
154
+ "nbformat_minor": 0
155
+ }
ipynb/EXL2_Private_Quant_V2.ipynb CHANGED
@@ -27,6 +27,7 @@
27
  "print(\"Installing pip dependencies\")\n",
28
  "!pip install -q -r requirements.txt\n",
29
  "!pip install -q huggingface_hub requests tqdm\n",
 
30
  "#@markdown Uses [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) by [oobabooga](https://github.com/oobabooga)\n",
31
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
32
  "model = \"none\"\n",
@@ -126,7 +127,7 @@
126
  "outputs": [],
127
  "source": [
128
  "#@title Quantize the model\n",
129
- "#@markdown ###Takes ~13 minutes to start quantizing first time, then quantization will last based on model size\n",
130
  "#@markdown Target bits per weight:\n",
131
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
132
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",
 
27
  "print(\"Installing pip dependencies\")\n",
28
  "!pip install -q -r requirements.txt\n",
29
  "!pip install -q huggingface_hub requests tqdm\n",
30
+ "!pip install . -q\n",
31
  "#@markdown Uses [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) by [oobabooga](https://github.com/oobabooga)\n",
32
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
33
  "model = \"none\"\n",
 
127
  "outputs": [],
128
  "source": [
129
  "#@title Quantize the model\n",
130
+ "#@markdown ###Quantization time will last based on model size\n",
131
  "#@markdown Target bits per weight:\n",
132
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
133
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",
ipynb/EXL2_Private_Quant_V3.ipynb CHANGED
@@ -29,6 +29,7 @@
29
  "print(\"Installing pip dependencies\")\n",
30
  "!pip install -q -r requirements.txt\n",
31
  "!pip install -q huggingface_hub requests tqdm accelerate transformers\n",
 
32
  "#@markdown Uses [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) and [convert-to-safetensors.py](https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py) by [oobabooga](https://github.com/oobabooga)\n",
33
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
34
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/convert-to-safetensors.py\n",
@@ -138,7 +139,7 @@
138
  "outputs": [],
139
  "source": [
140
  "#@title Quantize the model\n",
141
- "#@markdown ###Takes ~13 minutes to start quantizing first time, then quantization will last based on model size\n",
142
  "#@markdown Target bits per weight:\n",
143
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
144
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",
 
29
  "print(\"Installing pip dependencies\")\n",
30
  "!pip install -q -r requirements.txt\n",
31
  "!pip install -q huggingface_hub requests tqdm accelerate transformers\n",
32
+ "!pip install . -q\n",
33
  "#@markdown Uses [download-model.py](https://github.com/oobabooga/text-generation-webui/blob/main/download-model.py) and [convert-to-safetensors.py](https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py) by [oobabooga](https://github.com/oobabooga)\n",
34
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/download-model.py\n",
35
  "!wget https://raw.githubusercontent.com/oobabooga/text-generation-webui/main/convert-to-safetensors.py\n",
 
139
  "outputs": [],
140
  "source": [
141
  "#@title Quantize the model\n",
142
+ "#@markdown ###Quantization time will last based on model size\n",
143
  "#@markdown Target bits per weight:\n",
144
  "BPW = \"4.125\" # @param {type:\"string\"}\n",
145
  "!mkdir {model}-exl2-{BPW}bpw-WD\n",