ricdomolm
/

lawma-8b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lawma-8b / README.md

ricdomolm's picture

Update README.md

bf804ed verified 4 months ago

|

3.46 kB

	---
	language:
	- en
	license: mit
	tags:
	- legal
	datasets:
	- ricdomolm/lawma-all-tasks
	---

	# Lawma 8B

	Lawma 8B is a fine-tune of Llama 3 8B Instruct on 260 legal classification tasks derived from [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Lawma was fine-tuned on over 500k task examples, totalling 2B tokens. As a result, Lawma 8B outperforms GPT-4 on 95\% of these legal classification tasks, on average by over 17 accuracy points. See our [arXiv preprint](https://arxiv.org/abs/2407.16615) and [GitHub repository](https://github.com/socialfoundations/lawma) for more details.

	## Evaluations

	We report mean classification accuracy across the 260 legal classification tasks that we consider. We use the standard MMLU multiple-choice prompt, and evaluate models zero-shot. You can find our evaluation code [here](https://github.com/socialfoundations/lawma/tree/main/evaluation).

	\| Model \| All tasks \| Supreme Court tasks \| Court of Appeals tasks \|
	\|---------\|:---------:\|:-------------:\|:----------------:\|
	\| Lawma 70B \| 81.9 \| 84.1 \| 81.5 \|
	\| Lawma 8B \| 80.3 \| 82.4 \| 79.9 \|
	\| GPT4 \| 62.9 \| 59.8 \| 63.4 \|
	\| Llama 3 70B Inst \| 58.4 \| 47.1 \| 60.3 \|
	\| Mixtral 8x7B Inst \| 43.2 \| 24.4 \| 46.4 \|
	\| Llama 3 8B Inst \| 42.6 \| 32.8 \| 44.2 \|
	\| Majority classifier \| 41.7 \| 31.5 \| 43.5 \|
	\| Mistral 7B Inst \| 39.9 \| 19.5 \| 43.4 \|
	\| Saul 7B Inst \| 34.4 \| 20.2 \| 36.8 \|
	\| LegalBert \| 24.6 \| 13.6 \| 26.4 \|

	## FAQ

	What are the Lawma models useful for? We recommend using the Lawma models only for the legal classification tasks that they models were fine-tuned on. The main take-away of our paper is that specializing models leads to large improvements in performance. Therefore, we strongly recommend practitioners to further fine-tune Lawma on the actual tasks that the models will be used for. Relatively few examples --i.e, dozens or hundreds-- may already lead to large gains in performance.

	What legal classification tasks is Lawma fine-tuned on? We consider almost all of the variables of the [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Our reasons to study these legal classification tasks are both technical and substantive. From a technical machine learning perspective, these tasks provide highly non-trivial classification problems where
	even the best models leave much room for improvement. From a substantive legal perspective, efficient
	solutions to such classification problems have rich and important applications in legal research.

	## Citation

	This model was trained for the project

	Lawma: The Power of Specizalization for Legal Tasks. Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore. 2024

	Please cite as:

	```
	@misc{dominguezolmedo2024lawmapowerspecializationlegal,
	title={Lawma: The Power of Specialization for Legal Tasks},
	author={Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore},
	year={2024},
	eprint={2407.16615},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2407.16615},
	}
	```