Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ The court cases in our dataset range from 1950 to 2019, and belong to all legal
|
|
15 |
In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
|
16 |
The raw text corpus size is around 27 GB.
|
17 |
|
18 |
-
### Training
|
19 |
This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
|
20 |
We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
|
21 |
|
|
|
15 |
In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
|
16 |
The raw text corpus size is around 27 GB.
|
17 |
|
18 |
+
### Training Setup
|
19 |
This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
|
20 |
We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
|
21 |
|