crodri
/

falcon_aguila7b_quantized

Text Generation

Inference Endpoints

Model card Files Files and versions Community

crodri commited on Aug 24, 2023

Commit

4a641c6

•

1 Parent(s): 80a7210

Update README.md

Files changed (1) hide show

README.md +18 -22

README.md CHANGED Viewed

@@ -92,7 +92,8 @@ widget:
 **Ǎguila-7B** is a transformer-based causal language model for Catalan, Spanish, and English.
 It is based on the [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) model and has been trained on a 26B token
-trilingual corpus collected from publicly available corpora and crawlers.
 ## Intended uses and limitations
@@ -105,29 +106,24 @@ However, it is intended to be fine-tuned for downstream tasks.
 Here is how to use this model:
 ```python
-import torch
-from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
-input_text = "El mercat del barri és fantàstic, hi pots trobar"
-model_id  = "projecte-aina/aguila-7b"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-generator = pipeline(
-    "text-generation",
-    model=model_id,
-    tokenizer=tokenizer,
-    torch_dtype=torch.bfloat16,
-    trust_remote_code=True,
-    device_map="auto",
 )
-generation = generator(
-    input_text,
-    do_sample=True,
-    top_k=10,
-    eos_token_id=tokenizer.eos_token_id,
 )
-print(f"Result: {generation[0]['generated_text']}")
 ```
 ## Limitations and bias

 **Ǎguila-7B** is a transformer-based causal language model for Catalan, Spanish, and English.
 It is based on the [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) model and has been trained on a 26B token
+trilingual corpus collected from publicly available corpora and crawlers. This is a quantized version using ct2-transformers-converter,
+as in [michaelfeil/ct2fast-falcon-7b] (https://huggingface.co/michaelfeil/ct2fast-falcon-7b)
 ## Intended uses and limitations
 Here is how to use this model:
 ```python
+from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
+from transformers import AutoTokenizer
+model_name = "crodri/aguila_quantized"
+# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
+model = GeneratorCT2fromHfHub(
+        # load in int8 on CUDA
+        model_name_or_path=model_name,
+        device="cuda",
+        compute_type="int8_float16",
+        # tokenizer=AutoTokenizer.from_pretrained("tiiuae/falcon-7b")
 )
+outputs = model.generate(
+    text=["El millor de Barcelona es "],
+    max_length=512,
+    include_prompt_in_result=False
 )
+print(outputs)
 ```
 ## Limitations and bias