OFA-Sys
/

ProLLaMA-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mingfengxue commited on Oct 26, 2023

Commit

76dfd8c

•

1 Parent(s): 43c2ea7

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -1,3 +1,35 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- OFA-Sys/OccuQuest
 ---
+This is the ProLLaMA-7B model in [OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models](https://arxiv.org/abs/2310.16517).
+The dataset is on [OccuQuest](https://huggingface.co/datasets/OFA-Sys/OccuQuest).
+Abstract:
+The emergence of large language models (LLMs) has revolutionized natural language processing tasks.
+However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields.
+To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named OccuQuest, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories.
+We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries.
+By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations.
+Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora.
+We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations.
+Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM.
+Furthermore, we demonstrate the potential of combining OccuQuest with other instruction-tuning datasets to enhance the overall performance of LLMs.
+By fine-tuning LLaMA on a mixture of OccuQuest and Tulu datasets, we introduce ProLLaMA, which excels in addressing occupational questions and exhibits superior performance in comprehensive evaluations such as MMLU, GSM8K, BBH, and HumanEval.
+Among the different LLaMA variants, the 7B and 13B ProLLaMA models achieve the highest performance on MMLU and GSM8K, with the 7B ProLLaMA model demonstrating an improvement of more than 4 points over the other 7B variants on GSM8K.
+We open release the dataset and models.
+Please cite if you use this model:
+```
+@misc{xue2023occuquest,
+      title={OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models},
+      author={Mingfeng Xue and Dayiheng Liu and Kexin Yang and Guanting Dong and Wenqiang Lei and Zheng Yuan and Chang Zhou and Jingren Zhou},
+      year={2023},
+      eprint={2310.16517},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```