aiplanet
/

effi-13B-AWQ

4-bit precision

Model card Files Files and versions Community

lucifertrj commited on Mar 11

Commit

7c964f3

•

1 Parent(s): 0d13b6d

push model card

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: mit
+library_name: adapter-transformers
+---
+Effi-13B AWQ is a quantization model of our [Effi-13B](https://huggingface.co/aiplanet/effi-13b) a reasoning model.
+## About AWQ
+AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.
+It is also now supported by continuous batching server vLLM, allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios.
+effi-13B parameters is a causal decoder-only model built by AI Planet based on Llama-2-13b-chat-hf and fine tuned using the 1.8 Million coversations from CoT dataset available in huggingface datasets. The model is made available under the Apache 2.0 license.
+## Why use effi-13B-Instruct?
+- This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided.
+- Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from Llama-2-13b-chat-hf
+You will need at least 85-100GB of memory to run inference with effi-13b swiftly.
+## Our benchmarking
+| Metric             | Value   |
+|--------------------|---------|
+| Perplexity         | 5.529   |
+| MMLU               | 50.90   |
+| Hella Swag (acc)   | 59.38   |
+| Hella Swag (acc_norm) | 78.91 |
+| TruthfulQA         | 38.24   |
+## Direct Use
+effi-13b has been finetuned on a Chain of Thought dataset.
+## Out-of-Scope Use
+Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
+## Bias, Risks, and Limitations
+This model has been majorly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
+## Recommendations
+We recommend users of effi-13b to develop guardrails and take appropriate precautions for any production use.
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations.
+## Citations
+```
+@misc {lucifertrj,
+    author       = { {Tarun Jain} },
+    title        = { Effi-13B-AWQ by AI Planet},
+    year         = 2024,
+    url          = { https://huggingface.co/aiplanet/effi-13B-AWQ/ },
+    publisher    = { Hugging Face }
+}
+```