patrickvonplaten
commited on
Commit
•
952e4a1
1
Parent(s):
a1ea5cf
Update README.md
Browse files
README.md
CHANGED
@@ -27,21 +27,6 @@ Secondly, a single GPU will most likely not have enough memory to even load the
|
|
27 |
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
28 |
- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
|
29 |
|
30 |
-
---
|
31 |
-
language:
|
32 |
-
- en
|
33 |
-
- fr
|
34 |
-
- ro
|
35 |
-
- de
|
36 |
-
datasets:
|
37 |
-
- c4
|
38 |
-
tags:
|
39 |
-
- summarization
|
40 |
-
- translation
|
41 |
-
|
42 |
-
license: apache-2.0
|
43 |
-
---
|
44 |
-
|
45 |
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
46 |
|
47 |
## PreTraining
|
|
|
27 |
- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
|
28 |
- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
|
31 |
|
32 |
## PreTraining
|