Update README.md
Browse files
README.md
CHANGED
@@ -104,7 +104,7 @@ FLM-101B is trained on a cluster of 24 DGX-A800 GPU (8×80G) servers for less th
|
|
104 |
|
105 |
#### Software
|
106 |
|
107 |
-
FLM-101B was trained with
|
108 |
It uses a 3D(DP+TP+PP) parallelism approach and distributed optimizer.
|
109 |
|
110 |
|
|
|
104 |
|
105 |
#### Software
|
106 |
|
107 |
+
FLM-101B was trained with a codebase adapted from Megatron-LM.
|
108 |
It uses a 3D(DP+TP+PP) parallelism approach and distributed optimizer.
|
109 |
|
110 |
|