SerialKicked
commited on
Commit
•
8615360
1
Parent(s):
1da6a85
Update README.md
Browse files
README.md
CHANGED
@@ -18,16 +18,17 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
|
|
18 |
- Frontend is staging version of Silly Tavern.
|
19 |
- Backend is the latest version of KoboldCPP for Windows using CUDA 12.
|
20 |
- Using **CuBLAS** but **not using QuantMatMul (mmq)**.
|
|
|
21 |
- **7-10B Models:**
|
22 |
- All models are loaded in Q8_0 (GGUF)
|
23 |
-
- All models are extended to **16K context length** (auto rope from KCPP)
|
24 |
- **Flash Attention** and **ContextShift** enabled.
|
|
|
|
|
25 |
- **11-15B Models:**
|
26 |
- All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
|
27 |
-
- All models are extended to **12K context length** (auto rope from KCPP)
|
28 |
- **Flash Attention** and **8Bit cache compression** are enabled.
|
29 |
-
-
|
30 |
-
-
|
31 |
|
32 |
|
33 |
# System Prompt and Instruct Format
|
|
|
18 |
- Frontend is staging version of Silly Tavern.
|
19 |
- Backend is the latest version of KoboldCPP for Windows using CUDA 12.
|
20 |
- Using **CuBLAS** but **not using QuantMatMul (mmq)**.
|
21 |
+
- Fixed Seed for all tests: **123**
|
22 |
- **7-10B Models:**
|
23 |
- All models are loaded in Q8_0 (GGUF)
|
|
|
24 |
- **Flash Attention** and **ContextShift** enabled.
|
25 |
+
- All models are extended to **16K context length** (auto rope from KCPP)
|
26 |
+
- Response size set to 1024 tokens max.
|
27 |
- **11-15B Models:**
|
28 |
- All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
|
|
|
29 |
- **Flash Attention** and **8Bit cache compression** are enabled.
|
30 |
+
- All models are extended to **12K context length** (auto rope from KCPP)
|
31 |
+
- Response size set to 512 tokens max.
|
32 |
|
33 |
|
34 |
# System Prompt and Instruct Format
|