law-ai commited on
Commit
e8d1e90
1 Parent(s): 5cd5ee0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ The court cases in our dataset range from 1950 to 2019, and belong to all legal
15
  In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
16
  The raw text corpus size is around 27 GB.
17
 
18
- ### Training Objective
19
  This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
20
  We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
21
 
 
15
  In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
16
  The raw text corpus size is around 27 GB.
17
 
18
+ ### Training Setup
19
  This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
20
  We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.
21