File size: 2,359 Bytes
ea6fcf7
4d436e7
 
 
 
 
 
 
 
a9ce7ef
 
 
4d436e7
 
 
ea6fcf7
 
 
4d436e7
ea6fcf7
4d436e7
ea6fcf7
 
4d436e7
 
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
b9f14ba
 
 
 
 
 
 
 
 
 
 
 
4d436e7
b9f14ba
 
4d436e7
b9f14ba
 
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
ea6fcf7
4d436e7
 
 
 
 
 
 
 
ea6fcf7
4d436e7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
base_model: facebook/opt-125m
inference: false
model_creator: facebook
model_name: opt-125m
model_type: opt
pipeline_tag: text-generation
quantized_by: iproskurina
tags:
- gptq
- 4-bit
base_model_relation: quantized
license: other
language:
- en
---


<img src="https://cdn-uploads.huggingface.co/production/uploads/629a3dbcd496c6dcdebf41cc/t-6kpqFpEYJPT6zmvnm49.png" width="200" />

# OPT-125M-GPTQ


- Model creator: [Meta AI](https://huggingface.co/facebook)
- Original model: [OPT-125M](https://huggingface.co/facebook/opt-125m)

The model published in this repo was quantized to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).

**Quantization details**

**All quantization parameters were taken from [GPTQ paper](https://arxiv.org/abs/2210.17323).**

GPTQ calibration data consisted of 128 random 2048 token segments from the [C4 dataset](https://huggingface.co/datasets/c4).

The grouping size used for quantization is equal to 128.

## How to use this GPTQ model from Python code

### Install the necessary packages

Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.

```shell
pip3 install --upgrade transformers optimum
# If using PyTorch 2.1 + CUDA 12.x:
pip3 install --upgrade auto-gptq
# or, if using PyTorch 2.1 + CUDA 11.x:
pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
```

If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:

```shell
pip3 uninstall -y auto-gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.5.1
pip3 install .
```

### You can then use the following code

```python

from transformers import AutoTokenizer, TextGenerationPipeline,AutoModelForCausalLM
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
pretrained_model_dir = "iproskurina/opt-125m-gptq-4bit"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(pretrained_model_dir, device="cuda:0", model_basename="model")
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline("auto-gptq is")[0]["generated_text"])
```

[**LICENSE**](https://huggingface.co/facebook/opt-125m/blob/main/LICENSE.md)