metadata

license: llama3
datasets:
  - THUDM/AgentInstruct
  - anon8231489123/ShareGPT_Vicuna_unfiltered
language:
  - en
base_model:
  - meta-llama/Llama-3-8B-Instruct

Model Card for Model ID

This model is trained by lora for Retrospex based on AgentInstruct and ShareGPT datasets. The base model is Llama-3-8B-Instruct.

Model Details

Model Description

Developed by: Convai NJU
Shared by [optional]: Convai NJU
Model type: Llama model
Language(s) (NLP): en
License: llama3
Finetuned from model [optional]: Llama-3-8B-Instruct

Model Sources

Repository: https://github.com/Yufei-Xiang/Retrospex.git

Training Details

Training Data

AgentInstruct: https://huggingface.co/datasets/THUDM/AgentInstruct

ShareGPT: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

Training Hyperparameters

fp16: True
lr: 2e-5
batch size: 8
lora r: 16
lora alpha: 64

Citation

BibTeX:

@inproceedings{yufei2024retrospex,\ title={Retrospex: Language Agent Meets Offline Reinforcement Learning Critic},\ author={Yufei Xiang, Yiqun Shen, Yeqin Zhang and Cam-Tu Nguyen},\ booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, {EMNLP},\ year={2024}\ }