LoneStriker
commited on
Commit
•
df3a89f
1
Parent(s):
70aec39
Upload folder using huggingface_hub
Browse files- README.md +32 -0
- output.safetensors +1 -1
README.md
CHANGED
@@ -5,6 +5,34 @@ datasets:
|
|
5 |
pipeline_tag: text-generation
|
6 |
---
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
### Dataset:
|
9 |
Training dataset: [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
|
10 |
|
@@ -19,6 +47,10 @@ We utilize ONLY the prompts from [UltraFeedback](https://huggingface.co/datasets
|
|
19 |
This overview provides a high-level summary of our approach.
|
20 |
We plan to release more detailed results and findings in the coming weeks on the [Snorkel blog.](https://snorkel.ai/blog/)
|
21 |
|
|
|
|
|
|
|
|
|
22 |
### Training recipe:
|
23 |
- The provided data is formatted to be compatible with the Hugging Face's [Zephyr recipe](https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta).
|
24 |
We executed the n_th DPO iteration using the "train/test_iteration_{n}".
|
|
|
5 |
pipeline_tag: text-generation
|
6 |
---
|
7 |
|
8 |
+
We offer a temporary HF space for everyone to try out the model: -> [**Snorkel-Mistral-PairRM-DPO Space**](https://huggingface.co/spaces/snorkelai/snorkelai_mistral_pairrm_dpo_text_inference)
|
9 |
+
|
10 |
+
We also provide an inference endpoint for everyone to test the model.
|
11 |
+
It may initially take a few minutes to activate, but will eventually operate at the standard speed of HF's 7B model text inference endpoint.
|
12 |
+
The speed of inference depends on HF endpoint performance and is not related to Snorkel offerings.
|
13 |
+
This endpoint is designed for initial trials, not for ongoing production use. Have fun!
|
14 |
+
|
15 |
+
```
|
16 |
+
import requests
|
17 |
+
|
18 |
+
API_URL = "https://t1q6ks6fusyg1qq7.us-east-1.aws.endpoints.huggingface.cloud"
|
19 |
+
headers = {
|
20 |
+
"Accept" : "application/json",
|
21 |
+
"Content-Type": "application/json"
|
22 |
+
}
|
23 |
+
|
24 |
+
def query(payload):
|
25 |
+
response = requests.post(API_URL, headers=headers, json=payload)
|
26 |
+
return response.json()
|
27 |
+
|
28 |
+
output = query({
|
29 |
+
"inputs": "[INST] Recommend me some Hollywood movies [/INST]",
|
30 |
+
"parameters": {}
|
31 |
+
})
|
32 |
+
```
|
33 |
+
|
34 |
+
|
35 |
+
|
36 |
### Dataset:
|
37 |
Training dataset: [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
|
38 |
|
|
|
47 |
This overview provides a high-level summary of our approach.
|
48 |
We plan to release more detailed results and findings in the coming weeks on the [Snorkel blog.](https://snorkel.ai/blog/)
|
49 |
|
50 |
+
The prompt format follows the Mistral model:
|
51 |
+
|
52 |
+
```[INST] {prompt} [/INST]```
|
53 |
+
|
54 |
### Training recipe:
|
55 |
- The provided data is formatted to be compatible with the Hugging Face's [Zephyr recipe](https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta).
|
56 |
We executed the n_th DPO iteration using the "train/test_iteration_{n}".
|
output.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3861346868
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7918d9a4ececb4f3a0e9f2703b19161746833a8df0ecf9051c40034a9ffcd3f1
|
3 |
size 3861346868
|