akdeniz27 nazneen commited on
Commit
e7ee783
1 Parent(s): 32cd27a

model documentation (#2)

Browse files

- model documentation (bc4e2d4d36312bed643eec7cf658ecccc11d9ef2)


Co-authored-by: Nazneen Rajani <[email protected]>

Files changed (1) hide show
  1. README.md +180 -6
README.md CHANGED
@@ -1,13 +1,187 @@
 
1
  ---
2
  language: en
3
  datasets:
4
  - cuad
5
  ---
6
- # RoBERTa Large Model fine-tuned with CUAD dataset
7
- This model is the fine-tuned version of "RoBERTa Large"
8
- using CUAD dataset https://huggingface.co/datasets/cuad
9
 
10
- Link for model checkpoint: https://github.com/TheAtticusProject/cuad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- For the use of the model with CUAD: https://github.com/marshmellow77/cuad-demo
13
- and https://huggingface.co/spaces/akdeniz27/contract-understanding-atticus-dataset-demo
 
1
+
2
  ---
3
  language: en
4
  datasets:
5
  - cuad
6
  ---
 
 
 
7
 
8
+ # Model Card for RoBERTa Large Model fine-tuned with CUAD dataset
9
+
10
+ This model is the fine-tuned version of "RoBERTa Large" using CUAD dataset
11
+
12
+
13
+
14
+
15
+
16
+ # Model Details
17
+
18
+ ## Model Description
19
+
20
+ The [Contract Understanding Atticus Dataset (CUAD)](https://www.atticusprojectai.org/cuad), pronounced "kwad", a dataset for legal contract review curated by the Atticus Project.
21
+
22
+ Contract review is a task about "finding needles in a haystack."
23
+ We find that Transformer models have nascent performance on CUAD, but that this performance is strongly influenced by model design and training dataset size. Despite some promising results, there is still substantial room for improvement. As one of the only large, specialized NLP benchmarks annotated by experts, CUAD can serve as a challenging research benchmark for the broader NLP community.
24
+
25
+ - **Developed by:** TheAtticusProject
26
+ - **Shared by [Optional]:** HuggingFace
27
+ - **Model type:** Language model
28
+ - **Language(s) (NLP):** en
29
+ - **License:** More information needed
30
+ - **Related Models:** RoBERTA
31
+ - **Parent Model:**RoBERTA Large
32
+ - **Resources for more information:**
33
+ - [GitHub Repo](https://github.com/TheAtticusProject/cuad)
34
+ - [Associated Paper](https://arxiv.org/abs/2103.06268)
35
+
36
+ # Uses
37
+
38
+ ## Direct Use
39
+
40
+ Legal contract review
41
+
42
+ ## Downstream Use [Optional]
43
+
44
+ More information needed
45
+
46
+ ## Out-of-Scope Use
47
+
48
+
49
+ The model should not be used to intentionally create hostile or alienating environments for people.
50
+
51
+ # Bias, Risks, and Limitations
52
+
53
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
54
+
55
+
56
+ ## Recommendations
57
+
58
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recomendations.
59
+
60
+
61
+ # Training Details
62
+
63
+ ## Training Data
64
+ See [cuad dataset card](https://huggingface.co/datasets/cuad) for further details
65
+
66
+ ## Training Procedure
67
+
68
+ More information needed
69
+
70
+ ### Preprocessing
71
+
72
+ More information needed
73
+
74
+ ### Speeds, Sizes, Times
75
+
76
+ More information needed
77
+
78
+ # Evaluation
79
+
80
+
81
+
82
+ ## Testing Data, Factors & Metrics
83
+
84
+ ### Testing Data
85
+ #### Extra Data
86
+ Researchers may be interested in several gigabytes of unlabeled contract pretraining data, which is available [here](https://drive.google.com/file/d/1of37X0hAhECQ3BN_004D8gm6V88tgZaB/view?usp=sharing).
87
+
88
+ ### Factors
89
+
90
+ More information needed
91
+
92
+ ### Metrics
93
+
94
+ More information needed
95
+
96
+ ## Results
97
+
98
+
99
+
100
+
101
+ We [provide checkpoints](https://zenodo.org/record/4599830) for three of the best models fine-tuned on CUAD: RoBERTa-base (~100M parameters), RoBERTa-large (~300M parameters), and DeBERTa-xlarge (~900M parameters).
102
+
103
+
104
+
105
+
106
+ # Model Examination
107
+
108
+ More information needed
109
+
110
+ # Environmental Impact
111
+
112
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
113
+
114
+ - **Hardware Type:** More information needed
115
+ - **Hours used:** More information needed
116
+ - **Cloud Provider:** More information needed
117
+ - **Compute Region:** More information needed
118
+ - **Carbon Emitted:** More information needed
119
+
120
+ # Technical Specifications [optional]
121
+
122
+ ## Model Architecture and Objective
123
+
124
+ More information needed
125
+
126
+ ## Compute Infrastructure
127
+
128
+ More information needed
129
+
130
+ ### Hardware
131
+
132
+ More information needed
133
+
134
+ ### Software
135
+
136
+ The HuggingFace [Transformers](https://huggingface.co/transformers) library. It was tested with Python 3.8, PyTorch 1.7, and Transformers 4.3/4.4.
137
+
138
+ # Citation
139
+
140
+
141
+ **BibTeX:**
142
+
143
+ @article{hendrycks2021cuad,
144
+ title={CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review},
145
+ author={Dan Hendrycks and Collin Burns and Anya Chen and Spencer Ball},
146
+ journal={NeurIPS},
147
+ year={2021}
148
+ }
149
+
150
+
151
+
152
+ # Glossary [optional]
153
+
154
+ More information needed
155
+
156
+ # More Information [optional]
157
+
158
+ For more details about CUAD and legal contract review, see the [Atticus Project website](https://www.atticusprojectai.org/cuad).
159
+
160
+ # Model Card Authors [optional]
161
+
162
+ TheAtticusProject
163
+
164
+ # Model Card Contact
165
+
166
+ [TheAtticusProject](https://www.atticusprojectai.org/), in collaboration with the Ezi Ozoani and the HuggingFace Team
167
+
168
+
169
+ # How to Get Started with the Model
170
+
171
+ Use the code below to get started with the model.
172
+
173
+ <details>
174
+ <summary> Click to expand </summary>
175
+
176
+ ```python
177
+
178
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering
179
+
180
+ tokenizer = AutoTokenizer.from_pretrained("akdeniz27/roberta-large-cuad")
181
+
182
+ model = AutoModelForQuestionAnswering.from_pretrained("akdeniz27/roberta-large-cuad")
183
+ ```
184
+
185
+
186
+ </details>
187