Teja-Gollapudi commited on
Commit
b093c27
1 Parent(s): fea26e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -1,8 +1,7 @@
1
  ---
2
  license: cc-by-3.0
3
  datasets:
4
- - VMware/open-instruct-v1-oasst-dolly-hhrlhf
5
- - conceptofmind/cot_submix_original
6
  language:
7
  - en
8
  library_name: transformers
@@ -15,9 +14,11 @@ Instruction-tuned version of SalesForce/Xgen-7b-8k-base. The model is open for <
15
  <b> NOTE </b> : The model was trained using the Alpaca prompt template <br>
16
  <b> NOTE </b> : tiktoken library is required for the tokenizer. Set trust_remote_code=True when launching the tokenizer.<br>
17
 
18
- We expanded Open-instruct with additional commercially viable zero-shot COT datasets from Flan v2 (~70k). <br>
19
 
20
 
 
 
21
  Open-instruct-v1
22
  - Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
23
 
@@ -38,8 +39,9 @@ The model supports up to <b>8192 tokens </b>
38
 
39
  ## License
40
  - <b>Commercially Viable </b>
41
- - The instruction datasets used for instruction tuning are open for commercial usage. (TODO LIST OUT THE DATASETS)
42
  - Language Model, ([Salesforce/xgen-7b-8k-base](https://huggingface.co/Salesforce/xgen-7b-8k-base)) is under apache-2.0
 
43
 
44
 
45
 
 
1
  ---
2
  license: cc-by-3.0
3
  datasets:
4
+ - VMware/open-instruct
 
5
  language:
6
  - en
7
  library_name: transformers
 
14
  <b> NOTE </b> : The model was trained using the Alpaca prompt template <br>
15
  <b> NOTE </b> : tiktoken library is required for the tokenizer. Set trust_remote_code=True when launching the tokenizer.<br>
16
 
17
+ We expanded Open-instruct with additional commercially viable zero-shot COT datasets from Flan v2 to total of 140k instruct-prompt responses. <br>
18
 
19
 
20
+ <b>Open-instruct <br>
21
+
22
  Open-instruct-v1
23
  - Mosaic/Dolly-HHRLHF + filtered OASST1 - cc by 3.0
24
 
 
39
 
40
  ## License
41
  - <b>Commercially Viable </b>
42
+ - The instruction datasets used for instruction tuning are open for commercial usage.
43
  - Language Model, ([Salesforce/xgen-7b-8k-base](https://huggingface.co/Salesforce/xgen-7b-8k-base)) is under apache-2.0
44
+ - Dataset ([VMware/open-instruct](https://huggingface.co/datasets/VMware/open-instruct)) is under cc-by-sa-3.0
45
 
46
 
47