--- license: llama3 datasets: - THUDM/AgentInstruct - anon8231489123/ShareGPT_Vicuna_unfiltered language: - en base_model: - meta-llama/Llama-3-8B-Instruct --- # Model Card for Model ID This model is trained by lora for Retrospex based on AgentInstruct and ShareGPT datasets. The base model is Llama-3-8B-Instruct. ## Model Details ### Model Description - **Developed by:** Convai NJU - **Shared by [optional]:** Convai NJU - **Model type:** Llama model - **Language(s) (NLP):** en - **License:** llama3 - **Finetuned from model [optional]:** Llama-3-8B-Instruct ### Model Sources - **Repository:** https://github.com/Yufei-Xiang/Retrospex.git ## Training Details ### Training Data AgentInstruct: https://huggingface.co/datasets/THUDM/AgentInstruct ShareGPT: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered #### Training Hyperparameters - **fp16:** True - **lr:** 2e-5 - **batch size:** 8 - **lora r:** 16 - **lora alpha:** 64 ## Citation **BibTeX:** @inproceedings{yufei2024retrospex,\\ title={Retrospex: Language Agent Meets Offline Reinforcement Learning Critic},\\ author={Yufei Xiang, Yiqun Shen, Yeqin Zhang and Cam-Tu Nguyen},\\ booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, {EMNLP},\\ year={2024}\\ }