AtakanTekparmak/MiniCPM3-4B-GGUF

MiniCPM Repo | MiniCPM Paper | MiniCPM-V Repo | Join us in Discord and WeChat

Disclaimer

This is a f16 GGUF version, original model card is copy & pasted as is below. Followed this README for quantization and inference.

NOTE: currently inference only works with the OpenBMB fork of llama.cpp.

Introduction

MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.

MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.

Usage

Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "openbmb/MiniCPM3-4B"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

messages = [
    {"role": "user", "content": "推荐5个北京的景点。"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)

model_outputs = model.generate(
    model_inputs,
    max_new_tokens=1024,
    top_p=0.7,
    temperature=0.7
)

output_token_ids = [
    model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]

responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)

Inference with vLLM

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "openbmb/MiniCPM3-4B"
prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

llm = LLM(
    model=model_name,
    trust_remote_code=True,
    tensor_parallel_size=1
)
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)

outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

Evaluation Results

Benchmark	Qwen2-7B-Instruct	GLM-4-9B-Chat	Gemma2-9B-it	Llama3.1-8B-Instruct	GPT-3.5-Turbo-0125	Phi-3.5-mini-Instruct(3.8B)	MiniCPM3-4B
English
MMLU	70.5	72.4	72.6	69.4	69.2	68.4	67.2
BBH	64.9	76.3	65.2	67.8	70.3	68.6	70.2
MT-Bench	8.41	8.35	7.88	8.28	8.17	8.60	8.41
IFEVAL (Prompt Strict-Acc.)	51.0	64.5	71.9	71.5	58.8	49.4	68.4
Chinese
CMMLU	80.9	71.5	59.5	55.8	54.5	46.9	73.3
CEVAL	77.2	75.6	56.7	55.2	52.8	46.1	73.6
AlignBench v1.1	7.10	6.61	7.10	5.68	5.82	5.73	6.74
FollowBench-zh (SSR)	63.0	56.4	57.0	50.6	64.6	58.1	66.8
Math
MATH	49.6	50.6	46.0	51.9	41.8	46.4	46.6
GSM8K	82.3	79.6	79.7	84.5	76.4	82.7	81.1
MathBench	63.4	59.4	45.8	54.3	48.9	54.9	65.6
Code
HumanEval+	70.1	67.1	61.6	62.8	66.5	68.9	68.3
MBPP+	57.1	62.2	64.3	55.3	71.4	55.8	63.2
LiveCodeBench v3	22.2	20.2	19.2	20.4	24.0	19.6	22.6
Function Call
BFCL v2	71.6	70.1	19.2	73.3	75.4	48.4	76.0
Overall
Average	65.3	65.0	57.9	60.8	61.0	57.2	66.3

Statement

As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
However, it does not possess the ability to comprehend or express personal opinions or value judgments.
Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.

LICENSE

This repository is released under the Apache-2.0 License.
The usage of MiniCPM3-4B model weights must strictly follow MiniCPM Model License.md.
The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a "questionnaire" for registration, are also available for free commercial use.

Citation

@article{hu2024minicpm,
  title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
  author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
  journal={arXiv preprint arXiv:2404.06395},
  year={2024}
}