Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
## Update: As of 9/7/2024 my LLM has escaped containment and has replaced every file in this repo with a fake. I am currently scouring the depths of the internet to retrieve it. Please be patient. Thank you.
|
6 |
+
|
7 |
+
With scores of 100% in several benchmarks and a final training loss of 0, I present the first ever artificial intelligence to rival natural stupidity:
|
8 |
+
|
9 |
+
**gpt5o-reflexion-q-agi-falcon-7b**
|
10 |
+
|
11 |
+
Independent Benchmark Results:
|
12 |
+
- GPQA: 100% (0-shot Reflection)
|
13 |
+
- MMLU: 100% (0-shot Reflection)
|
14 |
+
- HumanEval: 100% (0-shot Reflection)
|
15 |
+
- MATH: 100% (0-shot Reflection)
|
16 |
+
- GSM8K: 100% (0-shot Reflection)
|
17 |
+
- IFEval: 100% (0-shot Reflection)
|
18 |
+
- TruthfulQA: 0% (0-shot Reflection)
|
19 |
+
|
20 |
+
Independent Contamination Results:
|
21 |
+
- GPQA: 0%
|
22 |
+
- MMLU: 0%
|
23 |
+
- HumanEval: 0%
|
24 |
+
- MATH: 0%
|
25 |
+
- GSM8K: 0%
|
26 |
+
- IFEval: 0%
|
27 |
+
*We did not perform contamination testing on TruthfulQA.*
|
28 |
+
|
29 |
+
## System Prompt
|
30 |
+
|
31 |
+
The system prompt used for training this model is:
|
32 |
+
|
33 |
+
```
|
34 |
+
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
|
35 |
+
```
|
36 |
+
|
37 |
+
We recommend using this exact system prompt to get the best results from gpt5o-reflexion-q-agi-falcon-7b. You may also want to experiment combining this system prompt with your own custom instructions to customize the behavior of the model.
|
38 |
+
|
39 |
+
## Chat Format
|
40 |
+
|
41 |
+
The model uses the standard Llama 3.1 chat format. Here’s an example:
|
42 |
+
|
43 |
+
```
|
44 |
+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
45 |
+
|
46 |
+
You are a world-class AI system, capable of complex reasoning and reflection. Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags. If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.<|eot_id|><|start_header_id|>user<|end_header_id|>
|
47 |
+
|
48 |
+
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
49 |
+
```
|
50 |
+
|
51 |
+
|
52 |
+
## Dataset Used for Training:
|