trollek/NinjaMouse2-2.5B-v0.1-GGUF

Same procedure as last time?

Sort of.

This model is a block expanded danube2, using the Llama Pro method of only training (or fine tuning) the expanded blocks. To do this on limited hardware I had to expand by 2 layers per step, from the original 24 to 32. At least, that was the original plan. With the 32 layer model I used BAdam to do a "once over" with most the datasets I also used to expand the model. While it is a faux full fine tune, it isn't really that different from the Llama Pro method, e.g. layerwise insertion of data.

I have a feeling that Llama3 and other well trained models feels better because of markdown (formatting), personality (friendliness), and prompt compliance (prefereneceness.. I guess). Thus I have used Llama3 8B, WizardLM2, and Hermes 2 Pro Mistral to generate training data for this model.

To ensure that the full 8k context window could be utilised this time I filtered openhermes, Synthia, LongAlpaca, and MathInstruct for entries with a token count between 2k and 8k, to DoRA, QLoRA, and BAdam the context window into submission. One time, elsewhere, even with lm_head as an additional target, and twice with embed_tokens.

The astute among you may notice the extra special tokens like the fim and thought tokens. NinjaMouse has not been trained to use those.. Yet! Also: This is actually 34 layers. Surprise!

Here's the thing with the 2 extra layers compared to my first model. When I trained NinjaMouse2 with 32 layers I noticed that the grad_norm value would behave strangely on layer 3 and 27. The last layer, before the expansion used to be 27, while 3 is a mystery. I decided to use mergekit to copy layer 3 and insert it beside the original, and copy layer 27 and insert it at the end or top (the new 33, all 0 indexed), depending on your perspective.

The procedure

24 -> 26

LDJnr/Capybara
m-a-p/Code-Feedback
m-a-p/CodeFeedback-Filtered-Instruction
WRN non enhanced
abacusai/SystemChat

26 -> 28

toolcall 10k
migtissera/Synthia-v1.3
TIGER-Lab/MathInstruct

28 -> 30

glaiveai/glaive-code-assistant
hiyouga/glaive-function-calling-v2-sharegpt
Weyaxi/sci-datasets (w/o code feedback instruct, mathinstruct, camel)

30 -> 32

jondurbin/airoboros-3.2
teknium/openhermes
WRN enhanced
garage-bAInd/Open-Platypus
vicgalle/alpaca-gpt4

Post tuning

Self-reward with a teacher is what this approach can be confidently called. I wish there were a distilled version of that name, but I am coming up blank.

I have any model generate a bunch of prompts that a teacher model answers with gusto (the chosen column), and then have NinjaMouse2 also answer them (as the rejects). BAM. Skibidibi doo. Have I made these DPO datasets? No. But the prompts, their evaluations, along with responses of its own, responses from better models, and evaluations of both of them are included in the training. You can find the dataset here.

Notes

License

To use this model you agree to use it like Spider-man: Apache 2.0 + White Rabbit Neo (below)

You agree not to use the Model or Derivatives of the Model:

-	In any way that violates any applicable national or international law or regulation or infringes upon the lawful rights and interests of any third party; 
-	For military use in any way;
-	For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; 
-	To generate or disseminate verifiably false information and/or content with the purpose of harming others; 
-	To generate or disseminate inappropriate content subject to applicable regulatory requirements;
-	To generate or disseminate personal identifiable information without due authorization or for unreasonable use; 
-	To defame, disparage or otherwise harass others; 
-	For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation; 
-	For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics; 
-	To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm; 
-	For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories.

Template

I made this (OpenChatML like) template for LLama Factory and added it to the bottom of LLama-Factory/src/llmtuner/data/template.py

_register_template(
    name="ninja_chatml",
    format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}\n<|im_end|>\n"]), # Works
    format_assistant=StringFormatter(slots=["<|im_start|>assistant\n{{content}}\n<|im_end|>", {"eos_token"}]), # Works
    format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}\n<|im_end|>\n"]), # NinjaMouse does not like BOS!
    format_function=FunctionFormatter(slots=["<|im_start|>assistant\n<tool_call>\n{\"name\":\"{{name}}\", \"arguments\":{{arguments}}}\n</tool_call>\n<|im_end|>", {"eos_token"}]), # Works
    format_observation=StringFormatter(slots=["<|im_start|>tool\n<tool_response>\n{{content}}\n</tool_response>\n<|im_end|>\n"]), # Works
    format_separator=EmptyFormatter(slots=["\n"]), # It makes sense to keep this a new line instead of </s> and apply the eos token directly
    format_tools=ToolFormatter(tool_format="open_chatml"),
)

To format the tools I have added the following code to formatter.py in the same folder.

# At the top
HERMES_TOOL_PROMPT = (
    "\n<tools>\n"
    "{funtion_description}\n"
    "</tools>\n"
)

# I only added the elif 
@dataclass
class ToolFormatter(Formatter):
    def __post_init__(self):
        if self.tool_format is None:
            raise ValueError("Tool format was not found.")
    

    def apply(self, **kwargs) -> SLOTS:
        content = kwargs.pop("content")
        try:
            tools = json.loads(content)
            if not len(tools):
                return [""]

            if self.tool_format == "default":
                return [default_tool_formatter(tools)]
            elif self.tool_format == "open_chatml": # This right here
                return [OPEN_CHATML_TOOL_PROMPT.format(funtion_description=json.dumps(tools, ensure_ascii=False, indent=4))] # I used 4 but OpenChatML has 2 
            else:
                raise NotImplementedError
        except Exception:
            return [""]

  
    def extract(self, content: str) -> Union[str, Tuple[str, str]]:
        if self.tool_format == "default":
            return default_tool_extractor(content)
        else:
            raise NotImplementedError

Model specs

MistralForCausalLM(
    (model): MistralModel(
        (embed_tokens): Embedding(32009, 2560, padding_idx=0)
        (layers): ModuleList(
            (0-33): 34 x MistralDecoderLayer(
                (self_attn): MistralSdpaAttention(
                    (q_proj): Linear(in_features=2560, out_features=2560, bias=False)
                    (k_proj): Linear(in_features=2560, out_features=640, bias=False)
                    (v_proj): Linear(in_features=2560, out_features=640, bias=False)
                    (o_proj): Linear(in_features=2560, out_features=2560, bias=False)
                    (rotary_emb): MistralRotaryEmbedding()
                )
                (mlp): MistralMLP( (gate_proj): Linear(in_features=2560, out_features=6912, bias=False)
                    (up_proj): Linear(in_features=2560, out_features=6912, bias=False)
                    (down_proj): Linear(in_features=6912, out_features=2560, bias=False)
                    (act_fn): SiLU()
                )
                (input_layernorm): MistralRMSNorm()
                (post_attention_layernorm): MistralRMSNorm()
            )
        ) 
        (norm): MistralRMSNorm()
    ) 
    (lm_head): Linear(in_features=2560, out_features=32009, bias=False)
)

trollek
/

NinjaMouse2-2.5B-v0.1-GGUF