law-ai
/

InCaseLawBERT

Inference Endpoints

Model card Files Files and versions Community

law-ai commited on Sep 14, 2022

Commit

e8d1e90

•

1 Parent(s): 5cd5ee0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ The court cases in our dataset range from 1950 to 2019, and belong to all legal
 In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
 The raw text corpus size is around 27 GB.
-### Training Objective
 This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
 We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.

 In total, our dataset contains around 5.4 million Indian legal documents (all in the English language).
 The raw text corpus size is around 27 GB.
+### Training Setup
 This model is initialized with the [Legal-BERT model](https://huggingface.co/zlucia/legalbert) from the paper [When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings](https://dl.acm.org/doi/abs/10.1145/3462757.3466088). In our work, we refer to this model as CaseLawBERT, and our re-trained model as InCaseLawBERT.
 We further train this model on our data for 300K steps on the Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks.