google-t5
/

t5-11b

@@ -27,21 +27,6 @@ Secondly, a single GPU will most likely not have enough memory to even load the
 - Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
 - DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
----
-language:
-- en
-- fr
-- ro
-- de
-datasets:
-- c4
-tags:
-- summarization
-- translation
-license: apache-2.0
----
 [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
 ## PreTraining

 - Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
 - DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
 [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
 ## PreTraining