jlamypoirier commited on
Commit
bbbd7a0
1 Parent(s): d328d4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +228 -0
README.md CHANGED
@@ -1,3 +1,231 @@
1
  ---
2
  license: openrail
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: openrail
3
+ datasets:
4
+ - bigcode/the-stack
5
+ language:
6
+ - code
7
+ programming_language:
8
+ - Java
9
+ - JavaScript
10
+ - Python
11
+ pipeline_tag: text-generation
12
+ inference: false
13
+
14
+ model-index:
15
+ - name: SantaCoder
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ dataset:
20
+ type: nuprl/MultiPL-E
21
+ name: MultiPL HumanEval (Python)
22
+ metrics:
23
+ - name: pass@1
24
+ type: pass@1
25
+ value: 0.18
26
+ verified: false
27
+ - name: pass@10
28
+ type: pass@10
29
+ value: 0.29
30
+ verified: false
31
+ - name: pass@100
32
+ type: pass@100
33
+ value: 0.49
34
+ verified: false
35
+ - task:
36
+ type: text-generation
37
+ dataset:
38
+ type: nuprl/MultiPL-E
39
+ name: MultiPL MBPP (Python)
40
+ metrics:
41
+ - name: pass@1
42
+ type: pass@1
43
+ value: 0.35
44
+ verified: false
45
+ - name: pass@10
46
+ type: pass@10
47
+ value: 0.58
48
+ verified: false
49
+ - name: pass@100
50
+ type: pass@100
51
+ value: 0.77
52
+ verified: false
53
+ - task:
54
+ type: text-generation
55
+ dataset:
56
+ type: nuprl/MultiPL-E
57
+ name: MultiPL HumanEval (JavaScript)
58
+ metrics:
59
+ - name: pass@1
60
+ type: pass@1
61
+ value: 0.16
62
+ verified: false
63
+ - name: pass@10
64
+ type: pass@10
65
+ value: 0.27
66
+ verified: false
67
+ - name: pass@100
68
+ type: pass@100
69
+ value: 0.47
70
+ verified: false
71
+ - task:
72
+ type: text-generation
73
+ dataset:
74
+ type: nuprl/MultiPL-E
75
+ name: MultiPL MBPP (Javascript)
76
+ metrics:
77
+ - name: pass@1
78
+ type: pass@1
79
+ value: 0.28
80
+ verified: false
81
+ - name: pass@10
82
+ type: pass@10
83
+ value: 0.51
84
+ verified: false
85
+ - name: pass@100
86
+ type: pass@100
87
+ value: 0.70
88
+ verified: false
89
+ - task:
90
+ type: text-generation
91
+ dataset:
92
+ type: nuprl/MultiPL-E
93
+ name: MultiPL HumanEval (Java)
94
+ metrics:
95
+ - name: pass@1
96
+ type: pass@1
97
+ value: 0.15
98
+ verified: false
99
+ - name: pass@10
100
+ type: pass@10
101
+ value: 0.26
102
+ verified: false
103
+ - name: pass@100
104
+ type: pass@100
105
+ value: 0.41
106
+ verified: false
107
+ - task:
108
+ type: text-generation
109
+ dataset:
110
+ type: nuprl/MultiPL-E
111
+ name: MultiPL MBPP (Java)
112
+ metrics:
113
+ - name: pass@1
114
+ type: pass@1
115
+ value: 0.28
116
+ verified: false
117
+ - name: pass@10
118
+ type: pass@10
119
+ value: 0.44
120
+ verified: false
121
+ - name: pass@100
122
+ type: pass@100
123
+ value: 0.59
124
+ verified: false
125
+ - task:
126
+ type: text-generation
127
+ dataset:
128
+ type: loubnabnl/humaneval_infilling
129
+ name: HumanEval FIM (Python)
130
+ metrics:
131
+ - name: single_line
132
+ type: exact_match
133
+ value: 0.44
134
+ verified: false
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ type: nuprl/MultiPL-E
139
+ name: MultiPL HumanEval FIM (Java)
140
+ metrics:
141
+ - name: single_line
142
+ type: exact_match
143
+ value: 0.62
144
+ verified: false
145
+ - task:
146
+ type: text-generation
147
+ dataset:
148
+ type: nuprl/MultiPL-E
149
+ name: MultiPL HumanEval FIM (JavaScript)
150
+ metrics:
151
+ - name: single_line
152
+ type: exact_match
153
+ value: 0.60
154
+ verified: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: code_x_glue_ct_code_to_text
159
+ name: CodeXGLUE code-to-text (Python)
160
+ metrics:
161
+ - name: BLEU
162
+ type: bleu
163
+ value: 18.13
164
+ verified: false
165
  ---
166
+
167
+ # SantaCoder
168
+
169
+ ![banner](https://huggingface.co/datasets/bigcode/admin/resolve/main/banner.png)
170
+
171
+ Play with the model on the [SantaCoder Space Demo](https://huggingface.co/spaces/bigcode/santacoder-demo).
172
+
173
+ # Table of Contents
174
+
175
+ 1. [Model Summary](#model-summary)
176
+ 2. [Use](#use)
177
+ 3. [Limitations](#limitations)
178
+ 4. [Training](#training)
179
+ 5. [License](#license)
180
+ 6. [Citation](#citation)
181
+
182
+ # Model Summary
183
+
184
+ This is the Megatron-version of [SantaCoder](https://huggingface.co/bigcode/santacoder).
185
+ We refer the reader to the [SantaCoder model page](https://huggingface.co/bigcode/santacoder) for full documentation about this model
186
+
187
+
188
+ - **Repository:** [bigcode/Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
189
+ - **Project Website:** [bigcode-project.org](www.bigcode-project.org)
190
+ - **Paper:** [🎅SantaCoder: Don't reach for the stars!🌟](https://t.co/YV3pzUbYOr)
191
+ - **Point of Contact:** [[email protected]](mailto:[email protected])
192
+ - **Languages:** Python, Java, and JavaScript
193
+
194
+ # Use
195
+
196
+ ## Intended use
197
+
198
+ The model was trained on GitHub code. As such it is _not_ an instruction model and commands like "Write a function that computes the square root." do not work well.
199
+ You should phrase commands like they occur in source code such as comments (e.g. `# the following function computes the sqrt`) or write a function signature and docstring and let the model complete the function body.
200
+
201
+ ### Attribution & Other Requirements
202
+
203
+ The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/santacoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
204
+
205
+ # Limitations
206
+
207
+ The model has been trained on source code in Python, Java, and JavaScript. The predominant language in source is English although other languages are also present. As such the model is capable to generate code snippets provided some context but the generated code is not guaranteed to work as intended. It can be inefficient, contain bugs or exploits.
208
+
209
+ # Training
210
+
211
+ ## Model
212
+
213
+ - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
214
+ - **Pretraining steps:** 600K
215
+ - **Pretraining tokens:** 236 billion
216
+ - **Precision:** float16
217
+
218
+ ## Hardware
219
+
220
+ - **GPUs:** 96 Tesla V100
221
+ - **Training time:** 6.2 days
222
+ - **Total FLOPS:** 2.1 x 10e21
223
+
224
+ ## Software
225
+
226
+ - **Orchestration:** [Megatron-LM](https://github.com/bigcode-project/Megatron-LM)
227
+ - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
228
+ - **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
229
+
230
+ # License
231
+ The model is licenses under the CodeML Open RAIL-M v0.1 license. You can find the full license [here](https://huggingface.co/spaces/bigcode/license).