Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
inference: false
|
4 |
+
tags:
|
5 |
+
- generated_from_trainer
|
6 |
+
- text-generation-inference
|
7 |
+
model-index:
|
8 |
+
- name: Mistral-7B-Telco
|
9 |
+
results: []
|
10 |
+
model_type: mistral
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
widget:
|
13 |
+
- messages:
|
14 |
+
- role: user
|
15 |
+
content: I would like to deactivate a cell phone, where could I do it?
|
16 |
+
---
|
17 |
+
|
18 |
+
# Mistral-7B-Telco
|
19 |
+
|
20 |
+
## Model Description
|
21 |
+
|
22 |
+
This model, "Mistral-7B-Telco", is a fine-tuned version of the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), specifically tailored for the Telco domain. It is optimized to answer questions and assist users with various Telco-related procedures. It has been trained using hybrid synthetic data generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools.
|
23 |
+
|
24 |
+
The goal of this model is to show that a generic verticalized model makes customization for a final use case much easier. An overview of this approach can be found at: [From General-Purpose LLMs to Verticalized Enterprise Models](https://www.bitext.com/blog/general-purpose-models-verticalized-enterprise-genai/)
|
25 |
+
|
26 |
+
## Intended Use
|
27 |
+
|
28 |
+
- **Recommended applications**: This model is designed to be used as the first step in Bitext’s two-step approach to LLM fine-tuning for the creation of chatbots, virtual assistants and copilots for the Telco domain, providing customers with fast and accurate answers about their needs.
|
29 |
+
- **Out-of-scope**: This model is not suited for non-telco related questions and should not be used for providing health, legal, or critical safety advice.
|
30 |
+
|
31 |
+
## Usage Example
|
32 |
+
|
33 |
+
```python
|
34 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
35 |
+
import torch
|
36 |
+
|
37 |
+
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
38 |
+
|
39 |
+
model = AutoModelForCausalLM.from_pretrained("bitext/Mistral-7B-Telco")
|
40 |
+
tokenizer = AutoTokenizer.from_pretrained("bitext/Mistral-7B-Telco")
|
41 |
+
|
42 |
+
messages = [
|
43 |
+
{"role": "system", "content": "You are an expert in customer support for Telco."},
|
44 |
+
{"role": "user", "content": "I would like to deactivate a cell phone, where could I do it?"},
|
45 |
+
]
|
46 |
+
|
47 |
+
encoded = tokenizer.apply_chat_template(messages, return_tensors="pt")
|
48 |
+
|
49 |
+
model_inputs = encoded.to(device)
|
50 |
+
model.to(device)
|
51 |
+
|
52 |
+
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
|
53 |
+
decoded = tokenizer.batch_decode(generated_ids)
|
54 |
+
print(decoded[0])
|
55 |
+
```
|
56 |
+
|
57 |
+
## Model Architecture
|
58 |
+
|
59 |
+
This model utilizes the `MistralForCausalLM` architecture with a `LlamaTokenizer`, ensuring it retains the foundational capabilities of the base model while being specifically enhanced for telco-related interactions.
|
60 |
+
|
61 |
+
## Training Data
|
62 |
+
|
63 |
+
The model was fine-tuned on the [Bitext Telco Dataset](https://huggingface.co/datasets/bitext/Bitext-telco-llm-chatbot-training-dataset) comprising various telco-related intents, including: set_usage_limits, activate_phone, check_mobile_payments, check_signal_coverage, invoices, and more. Totaling 25 intents, and each intent is represented by approximately 1000 examples.
|
64 |
+
|
65 |
+
This comprehensive training helps the model address a broad spectrum of telco-related questions effectively. The dataset follows the same structured approach as our dataset published on Hugging Face as [bitext/Bitext-customer-support-llm-chatbot-training-dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset), but with a focus on telco.
|
66 |
+
|
67 |
+
## Training Procedure
|
68 |
+
|
69 |
+
### Hyperparameters
|
70 |
+
|
71 |
+
- **Optimizer**: AdamW
|
72 |
+
- **Learning Rate**: 0.0002 with a cosine learning rate scheduler
|
73 |
+
- **Epochs**: 3
|
74 |
+
- **Batch Size**: 4
|
75 |
+
- **Gradient Accumulation Steps**: 4
|
76 |
+
- **Maximum Sequence Length**: 8192 tokens
|
77 |
+
|
78 |
+
### Environment
|
79 |
+
|
80 |
+
- **Transformers Version**: 4.43.4
|
81 |
+
- **Framework**: PyTorch 2.3.1+cu121
|
82 |
+
- **Tokenizers**: Tokenizers 0.19.1
|
83 |
+
|
84 |
+
## Limitations and Bias
|
85 |
+
|
86 |
+
- The model is trained for telco-specific contexts but may underperform in unrelated areas.
|
87 |
+
- Potential biases in the training data could affect the neutrality of the responses; users are encouraged to evaluate responses critically.
|
88 |
+
|
89 |
+
## Ethical Considerations
|
90 |
+
|
91 |
+
It is important to use this technology thoughtfully, ensuring it does not substitute for human judgment where necessary, especially in sensitive situations.
|
92 |
+
|
93 |
+
## Acknowledgments
|
94 |
+
|
95 |
+
This model was developed and trained by Bitext using proprietary data and technology.
|
96 |
+
|
97 |
+
## License
|
98 |
+
|
99 |
+
This model, "Mistral-7B-Telco", is licensed under the Apache License 2.0 by Bitext Innovations International, Inc. This open-source license allows for free use, modification, and distribution of the model but requires that proper credit be given to Bitext.
|
100 |
+
|
101 |
+
### Key Points of the Apache 2.0 License
|
102 |
+
|
103 |
+
- **Permissibility**: Users are allowed to use, modify, and distribute this software freely.
|
104 |
+
- **Attribution**: You must provide proper credit to Bitext Innovations International, Inc. when using this model, in accordance with the original copyright notices and the license.
|
105 |
+
- **Patent Grant**: The license includes a grant of patent rights from the contributors of the model.
|
106 |
+
- **No Warranty**: The model is provided "as is" without warranties of any kind.
|
107 |
+
|
108 |
+
You may view the full license text at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
|
109 |
+
|
110 |
+
This licensing ensures the model can be used widely and freely while respecting the intellectual contributions of Bitext. For more detailed information or specific legal questions about using this license, please refer to the official license documentation linked above.
|