Muthukumaran
commited on
Commit
•
b4474c6
1
Parent(s):
dabf2ec
Update README.md
Browse files
README.md
CHANGED
@@ -10,14 +10,14 @@ tags:
|
|
10 |
- biology
|
11 |
---
|
12 |
|
13 |
-
# Model Card for nasa-smd-ibm-distil-v0.1
|
14 |
|
15 |
-
nasa-smd-ibm-distil-v0.1
|
16 |
|
17 |
We trained the smaller model, INDUS_SMALL, with 38M parameters through knowledge distillation techniques by using INDUS as the teacher. INDUS_SMALL follows a 4-layer architecture recommended by the Neural Architecture Search engine (Trivedi et al., 2023) with an optimal trade-off between performance and latency. We adopted the distillation objective proposed in MiniLMv2 (Wang et al., 2021) to transfer fine-grained self-attention relations, which has been shown to be the current state-of-the-art (Udagawa et al., 2023). Using this objective, we trained the model for 500K steps with an effective batch size of 480 on 30 V100 GPUs.
|
18 |
|
19 |
## Model Details
|
20 |
-
- **Base Model**:
|
21 |
- **Tokenizer**: Custom
|
22 |
- **Original version Parameters**: 125M
|
23 |
- **Pretraining Strategy**: Masked Language Modeling (MLM)
|
|
|
10 |
- biology
|
11 |
---
|
12 |
|
13 |
+
# Model Card for INDUS-Small (nasa-smd-ibm-distil-v0.1)
|
14 |
|
15 |
+
INDUS-Small(nasa-smd-ibm-distil-v0.1) is a distilled version of the RoBERTa-based, Encoder-only transformer model INDUS (nasa-impact/nasa-smd-ibm-v0.1), domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
|
16 |
|
17 |
We trained the smaller model, INDUS_SMALL, with 38M parameters through knowledge distillation techniques by using INDUS as the teacher. INDUS_SMALL follows a 4-layer architecture recommended by the Neural Architecture Search engine (Trivedi et al., 2023) with an optimal trade-off between performance and latency. We adopted the distillation objective proposed in MiniLMv2 (Wang et al., 2021) to transfer fine-grained self-attention relations, which has been shown to be the current state-of-the-art (Udagawa et al., 2023). Using this objective, we trained the model for 500K steps with an effective batch size of 480 on 30 V100 GPUs.
|
18 |
|
19 |
## Model Details
|
20 |
+
- **Base Model**: INDUS
|
21 |
- **Tokenizer**: Custom
|
22 |
- **Original version Parameters**: 125M
|
23 |
- **Pretraining Strategy**: Masked Language Modeling (MLM)
|