keitokei1994
/

Llama-3-8B-shisa-2x8B

 language:
 - ja
 - en
+---
+### モデルの説明(English explanation is below.)
+このモデルは、MergeKitツールを使用して作成されたMixture of Experts (MoE) 言語モデルです。
+元のmeta-llama/Meta-Llama-3-8B-Instructに、日本語データセットでファインチューニングされたshisa-ai/shisa-v1-llama3-8を合わせることで、
+元のMeta-Llama-3-8B-Instructの能力を維持したまま、日本語能力を向上させようとしたモデルです。
+[Sdff-Ltba/LightChatAssistant-2x7B](https://huggingface.co/Sdff-Ltba/LightChatAssistant-2x7B)と
+[Aratako/LightChatAssistant-4x7B](https://huggingface.co/Aratako/LightChatAssistant-4x7B)にインスパイアされて、Llama3でのMoEを始めています。お二人に感謝します。
+お二人が行なっているような、ファインチューニングモデルから取り出したchatvectorを加算し、MoEモデル化するアプローチも手元では行なっていますので、
+今後時間のある時にモデルのアップロードができたらと考えています。
+### モデルの詳細
+- **モデル名**: Llama-3-8B-shisa-2x8B
+- **モデルアーキテクチャ**: Mixture of Experts (MoE)
+- **ベースモデル**: meta-llama/Meta-Llama-3-8B-Instruct, shisa-ai/shisa-v1-llama3-8b
+- **マージツール**: MergeKit
+#### 要求スペック
+Q4_K_M量子化モデルであれば、RTX3060 12GBでフルロード可能です。
+筆者はWSL2やGoogle Colaboratotry Proでの作成後、Llama.cppとLMstudioにて動作確認を行なっています。
+- CPU: Ryzen 5 3600
+- GPU: GeForce RTX 3060 12GB
+- RAM: DDR4-3200 96GB
+- OS: Windows 10
+---
+license: llama3
+language:
+- ja
+- en
+---
+---
+license: llama3
+language:
+- ja
+- en
+---
+### Model Description
+This model is a Mixture of Experts (MoE) language model created using the MergeKit tool.
+By combining the original meta-llama/Meta-Llama-3-8B-Instruct with shisa-ai/shisa-v1-llama3-8, which was fine-tuned on a Japanese dataset, this model aims to improve Japanese language capabilities while maintaining the abilities of the original Meta-Llama-3-8B-Instruct.
+Inspired by [Sdff-Ltba/LightChatAssistant-2x7B](https://huggingface.co/Sdff-Ltba/LightChatAssistant-2x7B) and [Aratako/LightChatAssistant-4x7B](https://huggingface.co/Aratako/LightChatAssistant-4x7B), I have started MoE on Llama3. I am grateful to both of them.
+I am also experimenting with adding chatvectors extracted from fine-tuned models and creating MoE models, similar to the approach taken by the two individuals mentioned above. I plan to upload the models in the future.
+### Model Details
+- **Model Name**: Llama-3-8B-shisa-2x8B
+- **Model Architecture**: Mixture of Experts (MoE)
+- **Base Models**: meta-llama/Meta-Llama-3-8B-Instruct, shisa-ai/shisa-v1-llama3-8b
+- **Merge Tool**: MergeKit
+#### Required Specifications
+With Q4_K_M quantization, the model can be fully loaded on an RTX 3060 12GB.
+I have tested the model on Windows, WSL2 in Windows and Google Colaboratory Pro after creation, and have verified its functionality using Llama.cpp and LMstudio.
+- CPU: Ryzen 5 3600
+- GPU: GeForce RTX 3060 12GB
+- RAM: DDR4-3200 96GB
+- OS: Windows 10