|
--- |
|
license: llama3 |
|
datasets: |
|
- THUDM/AgentInstruct |
|
- anon8231489123/ShareGPT_Vicuna_unfiltered |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3-8B-Instruct |
|
--- |
|
# Model Card for Model ID |
|
|
|
This model is trained by lora for Retrospex based on AgentInstruct and ShareGPT datasets. The base model is Llama-3-8B-Instruct. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Convai NJU |
|
- **Shared by [optional]:** Convai NJU |
|
- **Model type:** Llama model |
|
- **Language(s) (NLP):** en |
|
- **License:** llama3 |
|
- **Finetuned from model [optional]:** Llama-3-8B-Instruct |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/Yufei-Xiang/Retrospex.git |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
AgentInstruct: https://huggingface.co/datasets/THUDM/AgentInstruct |
|
|
|
ShareGPT: https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered |
|
|
|
#### Training Hyperparameters |
|
|
|
- **fp16:** True |
|
- **lr:** 2e-5 |
|
- **batch size:** 8 |
|
- **lora r:** 16 |
|
- **lora alpha:** 64 |
|
|
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
@inproceedings{yufei2024retrospex,\\ |
|
title={Retrospex: Language Agent Meets Offline Reinforcement Learning Critic},\\ |
|
author={Yufei Xiang, Yiqun Shen, Yeqin Zhang and Cam-Tu Nguyen},\\ |
|
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, {EMNLP},\\ |
|
year={2024}\\ |
|
} |
|
|