double7
/

vicuna-160m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

vicuna-160m / README.md

double7's picture

Update README.md

f548d12 verified 10 months ago

|

1.01 kB

	---
	license: apache-2.0
	datasets:
	- anon8231489123/ShareGPT_Vicuna_unfiltered
	language:
	- en
	pipeline_tag: text-generation
	---
	## Model description
	This is a Vicuna-like model with only 160M parameters, which is fine-tuned from [LLaMA-160m](https://huggingface.co/JackFram/llama-160m) on [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered) data.

	The training setup follows the [Vicuna suite](https://github.com/lm-sys/FastChat).

	The model is mainly developed as a base Small Speculative Model in [MCSD paper](https://arxiv.org/pdf/2401.06706.pdf). As a comparison, it can be better aligned to the Vicuna models than LLaMA-160m with little loss of alignment to the LLaMA models.

	\| Draft Model \| Target Model \| Alignment \|
	\| -------------- \| ------------- \| --------- \|
	\| LLaMA-68/160M \| LLaMA-13/33B \| 😃 \|
	\| LLaMA-68/160M \| Vicuna-13/33B \| 😟 \|
	\| Vicuna-68/160M \| LLaMA-13/33B \| 😃 \|
	\| Vicuna-68/160M \| Vicuna-13/33B \| 😃 \|