mlabonne
/

NeuralMix-2x7b

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NeuralMix-2x7b / README.md

mlabonne's picture

Create README.md

b9aae69 11 months ago

|

history blame contribute delete

2.27 kB

	---
	license: apache-2.0
	tags:
	- moe
	- mergekit
	---

	# NeuralMix-2x7b

	This model is a Mixure of Experts (MoE) made with [mergekit](https://github.com/cg123/mergekit) (mixtral branch). It uses the following base models:
	* [OpenPipe/mistral-ft-optimized-1218](https://huggingface.co/OpenPipe/mistral-ft-optimized-1218)
	* [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)

	## 💻 Usage

	```python
	!pip install -qU transformers bitsandbytes accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "mlabonne/NeuralMix-2x7b"

	tokenizer = AutoTokenizer.from_pretrained(model)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
	)

	messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
	prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```

	Output:
	```
	A Mixture of Experts (ME) is a neural network architecture that allows for adaptive specialization of its hidden layers. It consists of an input layer, a mixture of expert layers with a set of hidden layers, and an output layer. The expert layers have different specializations and each one is responsible for predicting the output for a particular subset of the input data. The mixture of experts uses a gating network to dynamically select the expert layer that best fits the current input data. This adaptive approach can improve the performance and generalization capabilities of the neural network.

	The Mixture of Experts model is particularly useful in situations where the data is complex, heterogeneous, or has varying structures. By enabling each expert to specialize in a particular type of input, the Mixture of Experts can learn to effectively handle diverse input data and provide more accurate predictions.

	Overall, the Mixture of Experts can be seen as a type of neural network that combines the strengths of multiple models to create a more powerful and flexible predictive tool.
	```