File size: 14,769 Bytes
3adea03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
# Contributing

The best way to contribute growing P3 is by writing prompts for new datasets!

### What are Prompts?

A prompt consists of a template: input template and target template, along with collection of associated metadata. A template is a piece of code written in a templating language called
[Jinja](https://jinja.palletsprojects.com/en/3.0.x/). A template defines
a function that maps an example from a dataset in the
[Hugging Face datasets library](https://huggingface.co/datasets) to two strings of
text. The first is called the _input_ which provides all information that
will be available to solve a task, such as the instruction and the context.
The second piece is called the _target_, which is the desired response to the
prompt.

### Quick-Start Guide to Writing Prompts

1. **Set up the app.** Fork the app and set up using the
[README](https://github.com/bigscience-workshop/promptsource/blob/main/README.md).
1. **Examine the dataset.** In the "Sourcing" mode, select or type the dataset into the dropdown.
If the dataset has subsets (subsets are not the same as splits), you can select
which one to work on. Note that prompts are subset-specific. You can find
out background information on the dataset by reading the information in the
app. The dataset is a collection of examples, and each example is a Python
dictionary. The sidebar will tell you the schema that each example has.
1. **Start a new prompt**. Enter a name for your first prompt and hit "Create."
You can always update the name later. If you want to cancel the prompt, select
"Delete Prompt."
1. **Write the prompt**. In the box labeled "Template," enter a Jinja expression.
See the [getting started guide](#getting-started-using-jinja-to-write-prompts)
and [cookbook](#jinja-cookbook) for details on how to write templates.
1. **Fill in metadata**. Fill in the metadata for the current prompt: reference, original task, choices in templates, metrics, languages, and answer choices.
See [Metadata](#metadata) for more details about these fields.
1. **Save the prompt**. Hit the "Save" button. The output of the prompt
applied to the current example will appear in the right sidebar.
1. **Verify the prompt**. Check that you didn't miss any case by scrolling
through a handful of examples of the prompted dataset using the
"Prompted dataset viewer" mode.
1. **Write between 5 and 10 prompts**. Repeat the steps 4 to 9 to create between 5
and 10 (more if you want!) prompts per dataset/subset. Feel free to introduce
a mix of formats, some that follow the templates listed in the [best practices](#best-practices)
and some that are more diverse in the format and the formulation.
1. **Duplicate the prompts(s).** If the dataset you have chosen bear the same
format as other datasets (for instance, `MNLI` and `SNLI` have identical formats),
you can simply duplicate the prompts you have written to these additional datasets.
1. **Upload the template(s).** Submit a PR using the instructions
[here](#uploading-prompts).

## Getting Started Using Jinja to Write Prompts

Here is a quick crash course on using [Jinja](https://jinja.palletsprojects.com/en/3.0.x/)
to write templates. More advanced usage is in the [cookbook](#jinja-cookbook).

Generally, in a template, you'll want to use a mix of hard-coded data that is
task-specific and stays the same across examples, and commands that tailor the
input and target to a specific example.

To write text that should be rendered as written, just write it normally. The
following "template" will produce the same text every time:
```jinja2
This is just literal text that will be printed the same way every time.
```

To make your template do something more interesting, you'll need to use Jinja
expressions. Jinja expressions are surrounded by curly braces `{` and `}`.
One common thing you'll want to do is access information in the dataset example.
When applied to an example, you can access any value in the example dictionary
via its key. If you just want to print that value surround it in double curly
braces. For example, if you want to print a value with the key `text`, use this:
```jinja2
The text in this example is {{ text }}.
```

You can also use information from the example to control behavior. For example,
suppose we have a label with the key `label` in our example, which either has a
value of 0 or 1. That's not very "natural" language, so maybe we want to decide
which label name to use based on the example. We can do this by creating a list
and indexing it with the example key:
```jinja2
The label for this example is {{ ["Label A", "Label B"][label] }}.
```
We can also use dictionaries for the same thing:
```jinja2
The label for this example is {{
{"a": "Label A",
 "b": "Label B"
}[label]
}}.
```

Note that some things in a template are particular to the task, and should not be
modified by downstream steps that try to increase the diversity of the prompts.
A common example is listing label names in the prompt to provide choices. Anything
that should not be modified by data augmentation should be surrounded by double
curly braces and quoted. For example:
```jinja2
The choices are {{"a"}}, {{"b"}}, and {{"c"}}.
```
You can leave binary options like yes/no, true/false, etc. unprotected.

Finally, remember that a template must produce two strings: an input and a target.
To separate these two pieces, use three vertical bars `|||`.
So, a complete template for Squad could be:
```jinja2
I'm working on the final exam for my class and am trying to figure out the answer
to the question "{{question}}" I found the following info on Wikipedia and I think
it has the answer. Can you tell me the answer?
{{context}}
|||
{{answers["text"][0]}}'
```

## Metadata
In addition to the template itself, you need to fill out several other fields.
These metadata facilitate finding and using the prompts.
* **Prompt Reference.** If your template was inspired by a paper, note the
reference in the "Prompt Reference" section. You can also add a description of
what your template does.
* **Original Task?** The checkbox should be checked if the template requires solving a
task that the underlying dataset is used to study. For example, a template that asks a
question from a question answering dataset would be an original task template, but one that asks
to generate a question for a given answer would not.
* **Choices in Template?** The checkbox should be checked if the input explicitly indicates
the options for the possible outputs (regardless of whether `answer_choices` is used).
* **Metrics.** Use the multiselect widget to select all metrics commonly used to evaluate
this task. Choose “Other” if there is one that is not included in the list.
* **Languages.** Use the multiselect widget to select all languages used in the prompt. This is independent of what languages are used in the underlying dataset. For example, you could have an English prompt for a Spanish dataset.
* **Answer Choices.**  If the prompt has a small set of possible outputs (e.g., Yes/No,
class labels, entailment judgements, etc.), then the prompt should define and use answer
choices as follows. This allows evaluation to consider just the possible targets for
scoring model outputs. The answer choices field is a Jinja expression that should produce
a `|||` separated list of all possible targets. If the choices don't change from example
to example, then you can just list them. For example, AG News is
```jinja2
World News ||| Sports ||| Business ||| Science and Technology
```
Note that whitespace is stripped from the ends of the choices. If answer choices are set,
then they are also available to Jinja in the prompt itself in the form of a list called
`answer_choices`. You should use this list in both input and target templates so that the
resulting inputs and targets match the answer choices field exactly. For example, a prompt
for AG News could use `answer_choices` like this:
```jinja2
{{text}} Which of the following sections of a newspaper would
this article likely appear in? {{answer_choices[0]}}, {{answer_choices[1]}},
{{answer_choices[2]}}, or {{answer_choices[3]}}?
|||
{{ answer_choices[label] }}
```
Since Answer Choices is a Jinja expression that has access to the example, it can also be used
to extract example-specific choices from the underlying data. For example, in AI2 ARC, we could
use
```jinja2
{{choices.text | join("|||")}}
```

## Best Practices

* **Writing target templates.** The target template should only contain the answer to the task.
It should not contain any extra text such as “The answer is…” (unless that extra text is also in
`answer_choices`). If `answer_choices` is populated, the output should only contain the values
in `answer_choices`.
* **Formatting multple-choice questions.** If the target should match the name of the choice
(e.g., “World News”), then it should list the choices either as part of a grammatical question
or a list with the marker for each (e.g, dashes). If the target should indicate the choice from
the list (e.g., “A,” “Explanation 1,” etc.), then it should list the choices with the indicator
before each one.
* **Choosing input and target pairs.** Lots of datasets have multiple columns that can be
combined to form different (input, target) pairs i.e. different "tasks". Don't hesitate to
introduce some diversity by prompting a given dataset into multiple tasks and provide some
description in the "Template Reference" text box. An example is given
in the already prompted `movie_rationales`.
* **Filtering prompts.** If a prompt is applied to an example and produces an
empty string, that prompt/example pair will be skipped.
You can therefore create prompts that only apply to a subset of the examples by
wrapping them in Jinja if statements. For example, in the `TREC` dataset, there
are fine-grained categories that are only applicable to certain coarse-grained categories.
We can capture this with the following prompt:
```jinja2
{% if label_coarse == 0 %}
Is this question asking for a {{"definition"}}, a {{"description"}}, a {{"manner of action"}}, or a {{"reason"}}?
{{text}}
|||
{{ {0: "Manner", 7: "Defintion", 9: "Reason", 12: "Description"}[label_fine] }}
{% endif %}
```
For datasets that have splits with no labels (for instance test split without ground truth labels), you can wrap the conditional statement on the target side.
For instance for `super_glue/boolq`, the following prompt would return an empty target on the test split, but not an empty prompted example:
```jinja2
{{ passage }}
Question: {{ question }}
Answer:
|||
{% if label != -1 %}
{{ answer_choices[label] }}
{% endif %}
```
* **Conditional generation format.** Always specify the target and separate it from the prompt
by indicating the vertical bars `|||`. The target will be generated by a generative model
conditioned on the input you wrote. You can always transform an "infix" prompt format
```jinja2
Given that {{premise}}, it {{ ["must be true", "might be true", "must be false"][label] }} that {{hypothesis}}
```
into a conditional generation format
```jinja2
Given that {{premise}}, it {{ "must be true, might be true, or must be false" }} that {{hypothesis}}?|||
{{ ["must be true", "might be true", "must be false"][label] }}
```
* **Pre-defined formats.** The goal is to collect a diverse set of prompts with diverse formats, but
we also want to include a few less diverse prompts that follow the following two structures:
1) A question-answer pair with optional multiple choices like:
```
[Context]                         # optional depending on the task
[Question]
[Label1], [Label2], [Label3]      # optional depending on the task
```
So for SNLI it will look like:
```jinja2
{{premise}}
Is it the case that {{hypothesis}}?
{{ "Yes" }}, {{ "No" }}, {{ "Maybe" }} ||| {{ ["Yes", "No", "Maybe"][label] }}
```

2) Task description followed by the input. So for SNLI it will look like:
```jinja2
Determine the relation between the following two sentences. The relations are entailment, contradiction, or neutral.
{{premise}}
{{hypothesis}} ||| {{label}}
```
* **Setting variables.** You might want to use the Jinja expression `{% set %}` to define a variable. If you do,
do it at the beginning of the prompt, outside any conditional statements, so that the automatic prompt checks
recognize that the variable is defined.

## More Examples

Here are a few interesting examples of prompts with explanations.

Here's one for `hellaswag`:
```jinja2
First, {{ ctx_a.lower() }} Then, {{ ctx_b.lower() }}...

Complete the above description with a chosen ending:

(a) {{ answer_choices[0] }}

(b) {{ answer_choices[1] }}

(c) {{ answer_choices[2] }}

(d) {{ answer_choices[3] }}

||| {{ answer_choices[label | int()] }}
```
Notice how it uses functions to consistently capitalize the information and provides lots
of context (referring explicitly to "description" and "chosen ending.")

Here's one for `head_qa`:
```jinja2
Given this list of statements about {{category}}: {{ answers | map(attribute="atext")
| map("lower") | map("trim", ".") | join(", ") }}.
Which one is the most appropriate answer/completion for the paragraph that follows?
{{qtext}}
|||
{% for answer in answers if answer["aid"]==ra -%}
{{answer["atext"]}}
{%- endfor %}
```
Like above, it uses functions to present the choices in a readable way. Also, it
uses a for loop with conditions to handle the more intricate dataset schema.

Here's one for `paws`:
```jinja2
Sentence 1: {{sentence1}}
Sentence 2: {{sentence2}}
Question: Does Sentence 1 paraphrase Sentence 2? Yes or No?
|||
{{answer_choices[label]}}
```
Notice that the choices `Yes or No` are not escaped. Yes/no, true/false
are choices that do not need to be escaped (unlike categories).

## Uploading Prompts

Once you save or modify a template, the corresponding file inside the `templates`
directory in the repo will be modified. To upload it, follow these steps:
1. Run `make style` and `make quality`.
2. Commit the modified template files (anything under `templates`) to git.
3. Push to your fork on GitHub.
4. Open a pull request against `main` on the PromptSource repo.


## Jinja Cookbook

- Accessing nested attributes of a dict
```jinja
{{ answers_spans.spans }}
```

- Joining list
```jinja=
{{ spans_list | join(", ") }}
```

- If conditions
```jinja
{% if label==0 %}
do_something
{% elif condition %}
do_something_else
{% endif %}
```
- Using `zip()` to zip multiple lists
```jinja
{% for a, b in zip(list_A, list_B) %}
do_something_with_a_and_b
{% endfor %}
```


Jinja includes lots of complex features but for most instances you likely only
need to use the methods above. If there's something you're not sure how to do,
just open an issue. We'll collect other frequent patterns here.