AronXiang
/

RetrospexLLaMA3

Model card Files Files and versions Community

RetrospexLLaMA3 / README.md

AronXiang's picture

Update README.md

aa327d6 verified about 1 month ago

|

1.31 kB

	---
	license: llama3
	datasets:
	- THUDM/AgentInstruct
	- anon8231489123/ShareGPT_Vicuna_unfiltered
	language:
	- en
	base_model:
	- meta-llama/Llama-3-8B-Instruct
	---
	# Model Card for Model ID

	This model is trained by lora for Retrospex based on AgentInstruct and ShareGPT datasets. The base model is Llama-3-8B-Instruct.

	## Model Details

	### Model Description

	- Developed by: Convai NJU
	- Shared by [optional]: Convai NJU
	- Model type: Llama model
	- Language(s) (NLP): en
	- License: llama3
	- Finetuned from model [optional]: Llama-3-8B-Instruct

	### Model Sources

	- Repository: https://github.com/Yufei-Xiang/Retrospex.git

	## Training Details

	### Training Data

	AgentInstruct: https://huggingface.co/datasets/THUDM/AgentInstruct

	ShareGPT: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered

	#### Training Hyperparameters

	- fp16: True
	- lr: 2e-5
	- batch size: 8
	- lora r: 16
	- lora alpha: 64


	## Citation

	BibTeX:

	@inproceedings{yufei2024retrospex,\\
	title={Retrospex: Language Agent Meets Offline Reinforcement Learning Critic},\\
	author={Yufei Xiang, Yiqun Shen, Yeqin Zhang and Cam-Tu Nguyen},\\
	booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, {EMNLP},\\
	year={2024}\\
	}