|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- OFA-Sys/OccuQuest |
|
--- |
|
|
|
This is the ProLLaMA-7B model in [OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models](https://arxiv.org/abs/2310.16517). |
|
|
|
The dataset is on [OccuQuest](https://huggingface.co/datasets/OFA-Sys/OccuQuest). |
|
|
|
Abstract: |
|
The emergence of large language models (LLMs) has revolutionized natural language processing tasks. |
|
However, existing instruction-tuning datasets suffer from occupational bias: the majority of data relates to only a few occupations, which hampers the instruction-tuned LLMs to generate helpful responses to professional queries from practitioners in specific fields. |
|
To mitigate this issue and promote occupation-inclusive LLMs, we create an instruction-tuning dataset named OccuQuest, which contains 110,000+ prompt-completion pairs and 30,000+ dialogues covering over 1,000 occupations in 26 occupational categories. |
|
We systematically request ChatGPT, organizing queries hierarchically based on Occupation, Responsibility, Topic, and Question, to ensure a comprehensive coverage of occupational specialty inquiries. |
|
By comparing with three commonly used datasets (Dolly, ShareGPT, and WizardLM), we observe that OccuQuest exhibits a more balanced distribution across occupations. |
|
Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora. |
|
We then fine-tune LLaMA on OccuQuest to obtain OccuLLaMA, which significantly outperforms state-of-the-art LLaMA variants (Vicuna, Tulu, and WizardLM) on professional questions in GPT-4 and human evaluations. |
|
Notably, on the occu-quora set, OccuLLaMA reaches a high win rate of 86.4\% against WizardLM. |
|
Furthermore, we demonstrate the potential of combining OccuQuest with other instruction-tuning datasets to enhance the overall performance of LLMs. |
|
By fine-tuning LLaMA on a mixture of OccuQuest and Tulu datasets, we introduce ProLLaMA, which excels in addressing occupational questions and exhibits superior performance in comprehensive evaluations such as MMLU, GSM8K, BBH, and HumanEval. |
|
Among the different LLaMA variants, the 7B and 13B ProLLaMA models achieve the highest performance on MMLU and GSM8K, with the 7B ProLLaMA model demonstrating an improvement of more than 4 points over the other 7B variants on GSM8K. |
|
We open release the dataset and models. |
|
|
|
Please cite if you use this model: |
|
``` |
|
@misc{xue2023occuquest, |
|
title={OccuQuest: Mitigating Occupational Bias for Inclusive Large Language Models}, |
|
author={Mingfeng Xue and Dayiheng Liu and Kexin Yang and Guanting Dong and Wenqiang Lei and Zheng Yuan and Chang Zhou and Jingren Zhou}, |
|
year={2023}, |
|
eprint={2310.16517}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |