kaitchup
/

OPT-1.3B-RLHF-DSChatLoRA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Model Card for Model ID

This a model is a chat model fine-tuned with RLHF using DeepSpeed Chat and LoRA. It is based on OPT1.3B.

Model Details

Model Description

Developed by: The Kaitchup
Model type: Causal
Language(s) (NLP): English
License: cc-by-nc-sa-4.0
Finetuned from model: facebook/opt-1.3b

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback

Downloads last month: 18

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train kaitchup/OPT-1.3B-RLHF-DSChatLoRA