Update README.md
Browse files
README.md
CHANGED
@@ -19,8 +19,8 @@ pipeline_tag: text-generation
|
|
19 |
|
20 |
**INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
|
21 |
The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
|
22 |
-
The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node
|
23 |
-
The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead.
|
24 |
|
25 |
For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
|
26 |
|
|
|
19 |
|
20 |
**INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
|
21 |
The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
|
22 |
+
The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node.
|
23 |
+
The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.
|
24 |
|
25 |
For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
|
26 |
|