kaitchup
/

OPT-1.3B-RLHF-DSChatLoRA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

OPT-1.3B-RLHF-DSChatLoRA / README.md

bnjmnmarie's picture

Update README.md

6fd1421 12 months ago

|

history blame contribute delete

No virus

776 Bytes

	---
	license: cc-by-nc-sa-4.0
	datasets:
	- Dahoas/rm-static
	language:
	- en
	---

	# Model Card for Model ID

	This a model is a chat model fine-tuned with RLHF using DeepSpeed Chat and LoRA.
	It is based on OPT1.3B.

	## Model Details

	### Model Description

	- Developed by: [The Kaitchup](https://kaitchup.substack.com/)
	- Model type: Causal
	- Language(s) (NLP): English
	- License: cc-by-nc-sa-4.0
	- Finetuned from model: [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)

	### Model Sources

	The model has been trained with the procedure described in this article:

	[Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback](https://kaitchup.substack.com/p/train-instruct-llms-on-your-gpu-with-6a5)