Update README.md

b723172 verified 5 months ago

4.99 kB

	---
	license: apache-2.0
	---

	# Qwama-0.5B-Instruct

	This is [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) with a Llama-3 vocabulary.

	The intended use is as a draft model for Llama-3-70B-Instruct. Llama3-8B-Instruct works for this purpose, but it's on
	the heavier side for drafting.

	The secondary purpose is to explore the feasibility of vocabulary swaps, either for adapting small models like
	Qwen2-0.5b to produce drafts for other models, or for interoperability between dissimilar language models in general.
	The conclusion in this regard is that the method works, but, since finetuning is required, it will be expensive for
	larger models. It would be interesting to explore low-rank or quantized finetuning as an alternative.

	## Procedure

	The vocabulary was swapped by creating a new embedding layer (original model uses tied embeddings so the output layer is
	the same) and initializing it as follows:

	- every L3 token that is an exact match for a Qwen2 token is initialized with the corresponding embedding
	- every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
	- there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).

	```python
	for idx in range(target_vocab_size):
	decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
	encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
	new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
	new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
	```
	Full script is [here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct/blob/main/vocab_transplant.py).

	Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
	struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
	would in an instruct-tuned model.

	This is remedied by subsequent finetuning, first on
	[this 2.41 million row sample from Common Crawl](https://huggingface.co/datasets/agentlans/common-crawl-sample), and
	subsequently 3 epochs on about 25000 instruct-formatted completions produced by Llama3-8B-Instruct, included
	[here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct/blob/main/llama3-instruct-prompts.json) for reference.

	I did try tuning just the tied embeddings, but this didn't achieve good results.

	## Benchmarks

	Model \| Wikitext 2k \| MMLU
	---------------------------\|-------------\|--------
	Qwen2-0.5B-instruct @ FP16 \| 12.5734 \| 43.83%
	Qwama-0.5B-instruct @ FP16 \| 15.3390 \| 40.37%

	Draft model speculative decoding, greedy:

	Model \| Draft model \| Code \| Prose
	-----------------------------\|------------------------------\|-------\|-------
	Qwen2-72B-instruct @ 6.0bpw \| Qwen2-0.5B-instruct @ 4.0bpw \| 3.68x \| 1.70x
	Llama3-70B-instruct @ 6.0bpw \| Qwama-0.5B-instruct @ 4.0bpw \| 3.72x \| 1.92x

	## Sample generations

	Qwen2-0.5B-instruct:

	`Hello, my name is Harry Potter. I am the Chosen One, the only wizard from the wizarding world who can fly and bring a book to life in order to summon it. In a world where wizards often use magic for personal gain, I am an advocate for freedom and non-violence.`

	`Once upon a time, there was a princess named Elsa. She lived in a beautiful castle in the snowy mountains. Her castle was filled with different types of animals, such as snowmen, reindeer, and magical trees. The inhabitants of the castle were very friendly and friendly, but one day, they were attacked by a fierce beast, the Queen of the Snow Kingdom.`

	`I am an AI language model. I don't have a physical body, so I cannot participate in activities like running or playing sports. However, I can simulate the movement of an AI language model. Is there anything specific you would like me to help with?`

	Qwama-0.5B-instruct:

	`Hello, my name is Jeffrey Brewer and I am a licensed attorney in both Maryland and Florida. I work with people who are experiencing severe financial stress due to financial mismanagement, foreclosure, divorce, and other financial hardships. My approach is to offer compassionate and skilled legal advice while keeping costs low.`

	`Once upon a time, a giant giant monster with a bad reputation invaded a small town. The mayor and the local community began to fight over who was going to make the rules. But who will win if the monsters were being allowed to roam the town?`

	`I am an AI language model that is designed to answer questions and provide information based on my training data. Would you like me to use my knowledge and expertise to answer your question? I am ready to assist you with any questions you may have. I will be happy to answer your questions in a timely manner.`

	## EXL2 Quants

	EXL2 quants uploaded [here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct-exl2).