SerialKicked commited on
Commit
8802cae
1 Parent(s): 7f391d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -18,8 +18,12 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
18
  - All models are loaded in Q8_0 (GGUF) with all layers on the GPU (NVidia RTX3060 12GB)
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
21
- - All models are extended to **16K context length** (auto rope from KCPP)
22
- - **Flash Attention** and **ContextShift** enabled.
 
 
 
 
23
  - Frontend is staging version of Silly Tavern.
24
  - Response size set to 1024 tokens max.
25
  - Fixed Seed for all tests: **123**
 
18
  - All models are loaded in Q8_0 (GGUF) with all layers on the GPU (NVidia RTX3060 12GB)
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
21
+ - 7-10B Models
22
+ - - All models are extended to **16K context length** (auto rope from KCPP)
23
+ - - **Flash Attention** and **ContextShift** enabled.
24
+ - 11-15B Models:
25
+ - - All models are extended to **12K context length** (auto rope from KCPP)
26
+ - - **Flash Attention** and **8Bit cache compression** are enabled.
27
  - Frontend is staging version of Silly Tavern.
28
  - Response size set to 1024 tokens max.
29
  - Fixed Seed for all tests: **123**