Mistral-7B-Retail / README.md
Bitext
Update README.md
a91ace9 verified
metadata
license: apache-2.0
tags:
  - axolotl
  - generated_from_trainer
  - text-generation-inference
base_model: mistralai/Mistral-7B-Instruct-v0.2
model_type: mistral
pipeline_tag: text-generation
model-index:
  - name: Mistral-7B-Retail-v1
    results: []

Mistral-7B-Retail-v1

Model Description

This model, named "Mistral-7B-Retail-v1," is a specially adjusted version of the mistralai/Mistral-7B-Instruct-v0.2. It is fine-tuned to manage questions and provide answers related to retail services.

Intended Use

  • Recommended applications: This model is perfect for use in retail environments. It can be integrated into customer service chatbots or help systems to provide real-time responses to common retail-related inquiries.
  • Out-of-scope: This model should not be used for medical, legal, or any serious safety-related purposes.

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bitext-llm/Mistral-7B-Retail-v1")
tokenizer = AutoTokenizer.from_pretrained("bitext-llm/Mistral-7B-Retail-v1")

inputs = tokenizer("<s>[INST] How can I return a purchased item? [/INST]", return_tensors="pt")
outputs = model.generate(inputs['input_ids'], max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Architecture

The "Mistral-7B-Retail-v1" uses the MistralForCausalLM structure with a LlamaTokenizer. It maintains the setup of the base model but is enhanced to better respond to retail-related questions.

Training Data

This model was trained with a dataset specifically designed for retail-related question and answer interactions. The dataset encompasses a comprehensive range of retail intents, ensuring the model is trained to handle diverse customer inquiries and scenarios. It includes 46 distinct intents such as add_product, availability_in_store, cancel_order, pay, refund_policy, track_order, use_app, and many more, reflecting common retail transactions and customer service interactions. Each intent contains 1000 examples, which helps in creating responses across various retail situations.

This extensive training dataset ensures that the model can understand and respond to a wide array of retail-related queries, providing support in customer service applications. The dataset follows a structured approach, similar to other datasets published on Hugging Face, but is specifically tailored to cater to the customer support sector: bitext/Bitext-customer-support-llm-chatbot-training-dataset

Training Procedure

Hyperparameters

  • Optimizer: AdamW
  • Learning Rate: 0.0002
  • Epochs: 1
  • Batch Size: 8
  • Gradient Accumulation Steps: 4
  • Maximum Sequence Length: 1024 tokens

Environment

  • Transformers Version: 4.40.0.dev0
  • Framework: PyTorch 2.2.1+cu121
  • Tokenizers: Tokenizers 0.15.0

Limitations and Bias

  • The model is fine-tuned on a domain-specific dataset and may not perform well outside the scope of retail advice.
  • Users should be aware of potential biases in the training data, as the model's responses may inadvertently reflect these biases. This model has been trained with a dataset that answers general retail questions, so potential biases may exist for specific use cases.

Ethical Considerations

This model should be used responsibly, considering ethical implications of automated financial advice. As it is a base model for this retail field, it is crucial to ensure that the model's advice complements human expertise and adheres to relevant retail regulations.

Acknowledgments

This model was developed by the Bitext and trained on infrastructure provided by Bitext.

License

This model, "Mistral-7B-Retail-v1", is licensed under the Apache License 2.0 by Bitext Innovations International, Inc. This open-source license allows for free use, modification, and distribution of the model but requires that proper credit be given to Bitext.

Key Points of the Apache 2.0 License

  • Permissibility: Users are allowed to use, modify, and distribute this software freely.
  • Attribution: You must provide proper credit to Bitext Innovations International, Inc. when using this model, in accordance with the original copyright notices and the license.
  • Patent Grant: The license includes a grant of patent rights from the contributors of the model.
  • No Warranty: The model is provided "as is" without warranties of any kind.

You may view the full license text at Apache License 2.0.

This licensing ensures the model can be used widely and freely while respecting the intellectual contributions of Bitext. For more detailed information or specific legal questions about using this license, please refer to the official license documentation linked above.