PrimeIntellect
/

INTELLECT-1-Instruct

Text Generation

Model card Files Files and versions Community

samsja commited on 13 days ago

Commit

73588cb

•

1 Parent(s): e9ca55b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -19,8 +19,8 @@ pipeline_tag: text-generation
 **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
 The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
-The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node
-The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead.
 For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).

 **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
 The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
+The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node.
+The model was trained using the [DiLoCo](https://arxiv.org/abs/2311.08105) algorithms with 100 inner steps. The global all-reduce was done with custom int8 all-reduce kernels to reduce the communication payload required, greatly reducing the communication overhead by a factor 400x.
 For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).