File size: 2,133 Bytes
22da63d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a91eaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: llama3
base_model: BanglaLLM/BanglaLLama-3-8b-BnWiki-Base
datasets:
  - wikimedia/wikipedia
language:
  - bn
  - en
tags:
  - bangla
  - large language model
  - text-generation-inference
  - transformers
library_name: transformers
pipeline_tag: text-generation
quantized_by: Tanvir1337
---

# Tanvir1337/BanglaLLama-3-8b-BnWiki-Base-GGUF

This model has been quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp/), a high-performance inference engine for large language models.

## System Prompt Format

To interact with the model, use the following prompt format:
```
{System}
### Prompt:
{User}
### Response:
```

## Usage Instructions

If you're new to using GGUF files, refer to [TheBloke's README](https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF) for detailed instructions.

## Quantization Options

The following graph compares various quantization types (lower is better):

![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)

For more information on quantization, see [Artefact2's notes](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9).

## Choosing the Right Model File

To select the optimal model file, consider the following factors:

1. **Memory constraints**: Determine how much RAM and/or VRAM you have available.
2. **Speed vs. quality**: If you prioritize speed, choose a model that fits within your GPU's VRAM. For maximum quality, consider a model that fits within the combined RAM and VRAM of your system.

**Quantization formats**:

* **K-quants** (e.g., Q5_K_M): A good starting point, offering a balance between speed and quality.
* **I-quants** (e.g., IQ3_M): Newer and more efficient, but may require specific hardware configurations (e.g., cuBLAS or rocBLAS).

**Hardware compatibility**:

* **I-quants**: Not compatible with Vulcan (AMD). If you have an AMD card, ensure you're using the rocBLAS build or a compatible inference engine.

For more information on the features and trade-offs of each quantization format, refer to the [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix).