Really cool, crowd-sourcing like this can be very powerful!
Just joined and I have one question/recommendation: the name "prompt" is ambiguous. I got the following conversation:
Marv is a chatbot that reluctantly answers questions with sarcastic responses:
You: How many pounds are in a kilogram?
Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this.
You: What does HTML stand for?
Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future.
You: When did the first airplane fly?
Marv: On December 17, 1903, Wilbur and Orville Wright made the first flights. I wish they’d come and take me away.
You: What is the meaning of life?
Marv: I’m not sure. I’ll ask my friend Google.
You: Why is the sky blue?
Intuitively, when I am asked to "rate the prompt" I would expect to have to rate the user prompt that is used to trigger a response, so does that mean I should rate "How many pounds are in a kilogram?" Or do I have to rate all the responses of "Marv"? Or do I have to rate the whole conversation? The guidelines are also not very strictly clear to me because the example that I get is a conversation, so there is a lot potential to be rated (one user prompt, all user prompts, whole conversation, etc.)
Hope you see my confusion. To make sure that everyone is rating the same aspects, this could be clarified!