rwitz commited on
Commit
ada7240
1 Parent(s): 6574683

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - unsloth/Llama-3.2-3B-bnb-4bit
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - art
10
+ - not-for-all-audiences
11
+ ---
12
+ ![Model Architecture](https://flic.kr/p/9SWAXj)
13
+
14
+ ## Table of Contents
15
+ - [Model Description](#model-description)
16
+ - [Model Architecture](#model-architecture)
17
+ - [Training Data](#training-data)
18
+ - [Training Procedure](#training-procedure)
19
+ - [Usage](#usage)
20
+ - [Limitations](#limitations)
21
+ - [Ethical Considerations](#ethical-considerations)
22
+ - [Acknowledgements](#acknowledgements)
23
+ - [Citations](#citations)
24
+ - [License](#license)
25
+
26
+ ## Model Description
27
+
28
+ **cat0.1** is a conversational AI model with **3 billion parameters**, optimized for efficiency using **4-bit precision**. Designed to engage in dynamic and uncensored dialogues, cat0.1 has been trained over the past eight months through an iterative process of training and interactive chatting. The model embodies a diverse range of characters, enabling versatile and engaging interactions. **cat0.1** is adapted from [unsloth/Llama-3.2-3B-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-3B-bnb-4bit), leveraging its robust architecture to enhance conversational capabilities.
29
+
30
+ ## Model Architecture
31
+
32
+ - **Parameters:** 3 billion
33
+ - **Precision:** 4-bit
34
+ - **Training Configuration:**
35
+ - **Rank:** 32
36
+ - **Alpha:** 64
37
+ - **Hardware:** Trained on an RTX 4090 laptop GPU
38
+
39
+ ## Training Data
40
+
41
+ The model was trained on a diverse set of conversational data collected over eight months. The data includes interactions with various characters, ensuring a wide range of conversational styles and topics. Training data is continuously updated with new chunks, allowing the model to evolve and adapt over time.
42
+
43
+ ## Training Procedure
44
+
45
+ cat0.1 employs a **progressive training** approach:
46
+ 1. **Initial Training:** The model is initially trained on a base set of conversational data.
47
+ 2. **Interactive Training:** The trained model is engaged in chats, generating new data based on its interactions.
48
+ 3. **Data Update Cycle:**
49
+ - **Data Collection:** New conversational data chunks are gathered from interactions.
50
+ - **Training Update:** The model is retrained with the new data. Occasionally, older data is removed to focus on recent interactions, while retaining previous model parameters.
51
+ 4. **Iteration:** This cycle of training and data updating is repeated frequently to ensure the model remains current and responsive.
52
+
53
+ ## Usage
54
+
55
+ cat0.1 is designed for applications requiring dynamic and unrestricted conversational capabilities. Suitable use cases include:
56
+
57
+ - **Chatbots:** For platforms needing engaging and versatile conversational agents.
58
+ - **Creative Writing Assistance:** Helping writers generate dialogue and character interactions.
59
+ - **Entertainment:** Providing interactive experiences in games and virtual environments.
60
+
61
+ ### Example
62
+
63
+ ```python
64
+ from transformers import AutoModelForCausalLM, AutoTokenizer
65
+ import torch
66
+
67
+ # Load the tokenizer and model
68
+ tokenizer = AutoTokenizer.from_pretrained("rwitz/cat0.1")
69
+ model = AutoModelForCausalLM.from_pretrained("rwitz/cat0.1", torch_dtype=torch.float16)
70
+
71
+ # Encode input
72
+ input_ids = tokenizer.encode("Hello, how are you?", return_tensors="pt")
73
+
74
+ # Generate response
75
+ with torch.no_grad():
76
+ output = model.generate(input_ids, max_length=50)
77
+
78
+ # Decode and print
79
+ print(tokenizer.decode(output[0], skip_special_tokens=True))