File size: 3,462 Bytes
e6397e8
 
 
c760a81
e6397e8
 
229323b
 
c760a81
81a5a72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf804ed
81a5a72
44f81a6
81a5a72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
language:
- en
license: mit
tags:
- legal
datasets:
- ricdomolm/lawma-all-tasks
---

# Lawma 8B

Lawma 8B is a fine-tune of Llama 3 8B Instruct on 260 legal classification tasks derived from [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Lawma was fine-tuned on over 500k task examples, totalling 2B tokens. As a result, Lawma 8B outperforms GPT-4 on 95\% of these legal classification tasks, on average by over 17 accuracy points. See our [arXiv preprint](https://arxiv.org/abs/2407.16615) and [GitHub repository](https://github.com/socialfoundations/lawma) for more details.

## Evaluations

We report mean classification accuracy across the 260 legal classification tasks that we consider. We use the standard MMLU multiple-choice prompt, and evaluate models zero-shot. You can find our evaluation code [here](https://github.com/socialfoundations/lawma/tree/main/evaluation).

| Model   | All tasks | Supreme Court tasks | Court of Appeals tasks |
|---------|:---------:|:-------------:|:----------------:|
| Lawma 70B | **81.9** | **84.1** | **81.5** |
| Lawma 8B | 80.3 | 82.4 | 79.9 |
| GPT4 | 62.9 | 59.8 | 63.4 |
| Llama 3 70B Inst | 58.4 | 47.1 | 60.3 |
| Mixtral 8x7B Inst | 43.2 | 24.4 | 46.4 |
| Llama 3 8B Inst | 42.6 | 32.8 | 44.2 |
| Majority classifier | 41.7 | 31.5 | 43.5 |
| Mistral 7B Inst | 39.9 | 19.5 | 43.4 |
| Saul 7B Inst | 34.4 | 20.2 | 36.8 |
| LegalBert | 24.6 | 13.6 | 26.4 |

## FAQ

**What are the Lawma models useful for?** We recommend using the Lawma models only for the legal classification tasks that they models were fine-tuned on. The main take-away of our paper is that specializing models leads to large improvements in performance. Therefore, we strongly recommend practitioners to further fine-tune Lawma on the actual tasks that the models will be used for. Relatively few examples --i.e, dozens or hundreds-- may already lead to large gains in performance.

**What legal classification tasks is Lawma fine-tuned on?** We consider almost all of the variables of the [Supreme Court](http://scdb.wustl.edu/data.php) and [Songer Court of Appeals](www.songerproject.org/us-courts-of-appeals-databases.html) databases. Our reasons to study these legal classification tasks are both technical and substantive. From a technical machine learning perspective, these tasks provide highly non-trivial classification problems where
even the best models leave much room for improvement. From a substantive legal perspective, efficient
solutions to such classification problems have rich and important applications in legal research.

## Citation

This model was trained for the project

*Lawma: The Power of Specizalization for Legal Tasks. Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore. 2024*

Please cite as:

```
@misc{dominguezolmedo2024lawmapowerspecializationlegal,
      title={Lawma: The Power of Specialization for Legal Tasks}, 
      author={Ricardo Dominguez-Olmedo and Vedant Nanda and Rediet Abebe and Stefan Bechtold and Christoph Engel and Jens Frankenreiter and Krishna Gummadi and Moritz Hardt and Michael Livermore},
      year={2024},
      eprint={2407.16615},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.16615}, 
}
```