--- license: mit language: - en base_model: - unsloth/Llama-3.2-3B-bnb-4bit pipeline_tag: text-generation tags: - art - music --- [![Odd Eyed Black Cat](https://live.staticflickr.com/2656/5827332576_baa0892dea_k.jpg)](https://flic.kr/p/9SWAXj) [Odd Eyed Black Cat](https://flic.kr/p/9SWAXj) by [fourbyfourblazer](https://www.flickr.com/photos/chrisyarzab/), on Flickr ## Table of Contents - [Model Description](#model-description) - [Model Architecture](#model-architecture) - [Training Data](#training-data) - [Training Procedure](#training-procedure) - [Usage](#usage) - [Limitations](#limitations) - [Ethical Considerations](#ethical-considerations) - [Acknowledgements](#acknowledgements) - [Citations](#citations) - [License](#license) ## Model Description **cat0.1** is a conversational AI model with **3 billion parameters**, optimized for efficiency using **4-bit precision**. Designed to engage in dynamic and uncensored dialogues, cat0.1 has been trained over the past eight months through an iterative process of training and interactive chatting. The model embodies a diverse range of characters, enabling versatile and engaging interactions. **cat0.1** is adapted from [unsloth/Llama-3.2-3B-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-3B-bnb-4bit), leveraging its robust architecture to enhance conversational capabilities. ## Model Architecture - **Parameters:** 3 billion - **Precision:** 4-bit - **Training Configuration:** - **Rank:** 32 - **Alpha:** 64 - **Hardware:** Trained on an RTX 4090 laptop GPU ## Training Data The model was trained on a diverse set of conversational data collected over eight months. The data includes interactions with various characters, ensuring a wide range of conversational styles and topics. Training data is continuously updated with new chunks, allowing the model to evolve and adapt over time. ## Training Procedure cat0.1 employs a **progressive training** approach: 1. **Initial Training:** The model is initially trained on a base set of conversational data. 2. **Interactive Training:** The trained model is engaged in chats, generating new data based on its interactions. 3. **Data Update Cycle:** - **Data Collection:** New conversational data chunks are gathered from interactions. - **Training Update:** The model is retrained with the new data. Occasionally, older data is removed to focus on recent interactions, while retaining previous model parameters. 4. **Iteration:** This cycle of training and data updating is repeated frequently to ensure the model remains current and responsive. ## Usage cat0.1 is designed for applications requiring dynamic and unrestricted conversational capabilities. Suitable use cases include: - **Chatbots:** For platforms needing engaging and versatile conversational agents. - **Creative Writing Assistance:** Helping writers generate dialogue and character interactions. - **Entertainment:** Providing interactive experiences in games and virtual environments. ### Example ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load the tokenizer and model tokenizer = AutoTokenizer.from_pretrained("rwitz/cat0.1") model = AutoModelForCausalLM.from_pretrained("rwitz/cat0.1", torch_dtype=torch.float16) # Encode input input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt") # Generate response with torch.no_grad(): output = model.generate(input_ids, max_length=50) # Decode and print print(tokenizer.decode(output[0], skip_special_tokens=True))