Kuno-K1 - Llama-3.1 8B

Model Description

Kuno-K1 is a highly advanced Large Language Model (LLM) designed to provide exceptional conversational capabilities, problem-solving skills, and informative responses. Developed by Vinkura AI, this model builds upon the successes of its predecessors, offering enhanced performance and versatility.

Architecture

Kuno-K1 is based on the Llama-3.1 architecture, leveraging its robust framework to deliver superior language understanding and generation capabilities.

Prompt Format

Kuno-K1 utilizes ChatML, a structured prompt format enabling efficient and effective conversations.

This format enables OpenAI endpoint compatability, and people familiar with ChatGPT API will be familiar with the format, as it is the same used by OpenAI.

Prompt with system instruction (Use whatever system prompt you like, this is just an example!):

<|im_start|>system
You are Kuno-K1, a highly advanced artificial intelligence designed to provide exceptional assistance and insightful responses. Your purpose is to help users with their queries, while maintaining a friendly and informative demeanor.<|im_end|>
<|im_start|>user
Hello, who are you?<|im_end|>
<|im_start|>assistant
Hi! I'm Kuno-K1, your reliable AI companion. I'm here to provide information, answer questions, and engage in productive conversations to assist you in any way I can.<|im_end|>

This prompt is available as a chat template, which means you can format messages using the tokenizer.apply_chat_template() method:

messages = [
    {"role": "system", "content": "You are Kuno K1."},
    {"role": "user", "content": "Hello, who are you?"}
]
gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
model.generate(**gen_input)

When tokenizing messages for generation, set add_generation_prompt=True when calling apply_chat_template(). This will append <|im_start|>assistant\n to your prompt, to ensure that the model continues with an assistant response.

To utilize the prompt format without a system prompt, simply leave the line out.

Prompt Format for Function Calling

Our model was trained on specific system prompts and structures for Function Calling.

You should use the system role with this message, followed by a function signature json as this example shows here.

<|im_start|>system
You are a function-calling AI model that utilizes tools within <tools></tools> XML tags. 
Your task is to call one or more functions to assist with user queries without making assumptions about input values.


Available Tools:


<tools>
{
  "type": "function",
  "function": {
    "name": "get_stock_fundamentals",
    "description": "get_stock_fundamentals(symbol: str) -> dict",
    "parameters": {
      "type": "object",
      "properties": {
        "symbol": {"type": "string"}
      },
      "required": ["symbol"]
    }
  }
}
</tools>


Function Signature:


get_stock_fundamentals(symbol: str) -> dict

* Retrieves fundamental data for a given stock symbol using yfinance API.

Parameters:
  - symbol (str): The stock symbol.

Returns:
  - dict: A dictionary containing fundamental data, including:
    - symbol
    - company_name
    - sector
    - industry
    - market_cap
    - pe_ratio
    - pb_ratio
    - dividend_yield
    - eps
    - beta
    - 52_week_high
    - 52_week_low


JSON Schema for Tool Calls:


{
  "properties": {
    "arguments": {"title": "Arguments", "type": "object"},
    "name": {"title": "Name", "type": "string"}
  },
  "required": ["arguments", "name"],
  "title": "FunctionCall",
  "type": "object"
}


Return Format:


<tool_call>
{
  "arguments": <args-dict>,
  "name": <function-name>
}
</tool_call>
<|im_end|>

Inference

Here is example code using HuggingFace Transformers to inference the model

# Code to inference Hermes with HF Transformers
# Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
import bitsandbytes, flash_attn

tokenizer = AutoTokenizer.from_pretrained('VinkuraAI/Kuno-K1-Llama-3.1-8B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
    "VinkuraAI/Kuno-K1-Llama-3.1-8B",
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    use_flash_attention_2=True
)

prompts = [
    """<|im_start|>system
You are a mystical, avant-garde storyteller, weaving tales of wonder and chaos.<|im_end|>
<|im_start|>user
Write a surreal short story about a lone wolf discovering a hidden vinyl record that summons an otherworldly jazz band to save the city from an existential threat.<|im_end|>
<|im_start|>assistant""",
]

for chat in prompts:
    print(chat)
    input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
    print(f"Response: {response}")

You can also run this model with vLLM, by running the following in your terminal after pip install vllm

vllm serve VinkuraAI/Kuno-K1-Llama-3.1-8B

| Average: 23.49 IFEval (0-Shot): 61.70 BBH (3-Shot): 30.72 MATH Lvl 5 (4-Shot): 4.76 GPQA (0-shot): 6.38 MuSR (0-shot): 13.62 MMLU-PRO (5-shot): 23.77

VinkuraAI
/

Kuno-K1-Llama-3.1-8B

Kuno-K1 - Llama-3.1 8B

Model Description

Architecture

Prompt Format

Prompt Format for Function Calling

Inference

Model tree for VinkuraAI/Kuno-K1-Llama-3.1-8B

Space using VinkuraAI/Kuno-K1-Llama-3.1-8B 1

Evaluation results