Cathallama
Awesome model, my new daily driver.
Edit: I am seeing a lot of token generations pointing to unknown unicode addresses that didn't show up during testing for this model, so I have stopped using it and I am working on a new version.
Notable Performance
- 9% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b at Q4_0
- Strong performance in MMLU-PRO categories overall
- Great performance during manual testing
Creation workflow
Models merged
- meta-llama/Meta-Llama-3.1-70B-Instruct
- turboderp/Cat-Llama-3-70B-instruct
- Nexusflow/Athene-70B
flowchart TD
A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
B -->| | E[Merge]
D -->| | E[Merge]
E[Merge] -->|Result| F[Cathallama]
Testing
Hyperparameters
- Temperature: 0.0 for automated, 0.9 for manual
- Penalize repeat sequence: 1.05
- Consider N tokens for penalize: 256
- Penalize repetition of newlines
- Top-K sampling: 40
- Top-P sampling: 0.95
- Min-P sampling: 0.05
LLaMAcpp Version
- b3527-2-g2d5dd7bb
- -fa -ngl -1 -ctk f16 --no-mmap
Tested Files
- Cathallama-70B.Q4_0.gguf
- Nexusflow_Athene-70B.Q4_0.gguf
- turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
- Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
Tests
Manual testing
Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
---|---|---|---|---|---|
Common Sense | Ball on cup | OK | KO | KO | OK |
Big duck small horse | KO | OK | KO | OK | |
Killers | OK | OK | KO | OK | |
Strawberry r's | OK | KO | KO | KO | |
9.11 or 9.9 bigger | KO | OK | OK | KO | |
Dragon or lens | KO | KO | KO | KO | |
Shirts | OK | OK | KO | KO | |
Sisters | OK | KO | KO | KO | |
Jane faster | OK | OK | OK | OK | |
Programming | JSON | OK | OK | OK | OK |
Python snake game | OK | KO | KO | KO | |
Math | Door window combination | OK | OK | KO | KO |
Smoke | Poem | OK | OK | OK | OK |
Story | OK | OK | KO | OK |
Note: See sample_generations.txt on the main folder of the repo for the raw generations.
MMLU-PRO
Model | Success % |
---|---|
Cathallama-70B.Q4_0.gguf | 51.0% |
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 37.0% |
Nexusflow_Athene-70B.Q4_0.gguf | 41.0% |
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 42.0% |
MMLU-PRO category | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
---|---|---|---|---|
Business | 50.0% | 45.0% | 20.0% | 40.0% |
Law | 40.0% | 30.0% | 30.0% | 35.0% |
Psychology | 85.0% | 80.0% | 70.0% | 75.0% |
Biology | 80.0% | 70.0% | 85.0% | 80.0% |
Chemistry | 55.0% | 40.0% | 35.0% | 35.0% |
History | 65.0% | 60.0% | 55.0% | 65.0% |
Other | 55.0% | 50.0% | 45.0% | 50.0% |
Health | 75.0% | 40.0% | 60.0% | 65.0% |
Economics | 80.0% | 75.0% | 65.0% | 70.0% |
Math | 45.0% | 35.0% | 15.0% | 40.0% |
Physics | 50.0% | 45.0% | 45.0% | 45.0% |
Computer Science | 60.0% | 55.0% | 55.0% | 60.0% |
Philosophy | 55.0% | 60.0% | 45.0% | 50.0% |
Engineering | 35.0% | 40.0% | 25.0% | 35.0% |
Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.
PubmedQA
Model Name | Success% |
---|---|
Cathallama-70B.Q4_0.gguf | 73.00% |
turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | 76.00% |
Nexusflow_Athene-70B.Q4_0.gguf | 67.00% |
Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% |
Request
If you are hiring in the EU or can sponsor a visa, PM me :D
PS. Thank you mradermacher for the GGUFs!
- Downloads last month
- 153
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for gbueno86/Cathallama-70B
Merge model
this model