metadata

license: apache-2.0
model-index:
  - name: BrainTransformers-3B-Chat
    results:
      - task:
          type: text-generation
        dataset:
          name: mmlu
          type: mmlu
        metrics:
          - name: MMLU
            type: MMLU
            value: 63.2
      - task:
          type: text-generation
        dataset:
          name: bbh
          type: bbh
        metrics:
          - name: BBH
            type: BBH
            value: 54.1
      - task:
          type: text-generation
        dataset:
          name: arc-challenge
          type: arc-challenge
        metrics:
          - name: ARC-C
            type: ARC-C
            value: 54.3
      - task:
          type: text-generation
        dataset:
          name: hellaswag
          type: hellaswag
        metrics:
          - name: HellaSwag
            type: HellaSwag
            value: 72.8
      - task:
          type: text-generation
        dataset:
          name: gsm8k
          type: gsm8k
        metrics:
          - name: GSM8K
            type: GSM8K
            value: 76.3
      - task:
          type: code-generation
        dataset:
          name: humaneval
          type: humaneval
        metrics:
          - name: HumanEval
            type: HumanEval
            value: 40.5
    source:
      name: LumenScopeAI
      url: https://github.com/LumenScopeAI/BrainTransformers-SNN-LLM

BrainTransformers: SNN-LLM

Based on BrainTransformers, BrainGPTForCausalLM is a Large Language Model (LLM) implemented using Spiking Neural Networks (SNN). We are excited to announce that our technical report is now available on arXiv: BrainTransformers: SNN-LLM

We plan to further optimize the model at the operator level and adapt it for hardware compatibility, enabling BrainGPTForCausalLM to be deployed on more energy-efficient SNN hardware devices.

The current open-source version retains some floating-point calculations to ensure computational efficiency. We will continue to optimize this. Some detailed explanations are provided in the comments within the source code.

Stay tuned for updates as we continue to refine and expand our research findings.

You can try it online at www.lumenscopeai.com.

Model Availability

The current pre-trained model parameters have been published on Hugging Face.LumenscopeAI/BrainTransformers-3B-Chat
The current pre-trained model parameters have been published on WiseModel. LumenScopeAI/BrainTransformers-3B-Chat

Repository

The github link is: LumenScopeAI/BrainTransformers-SNN-LLM

Model Performance

Below are the performance metrics of our 3B model on various benchmarks:

General Tasks

Dataset	Performance
MMLU	63.2
MMLU-pro	33.3
MMLU-redux	61.3
BBH	54.1
ARC-C	54.3
Trurhfulqa	47.1
Winogrande	68.8
Hellaswag	72.8

Math and Science Tasks

Dataset	Performance
GPQA	25.3
Theoremqa	26.4
MATH	41.0
MMLU-stem	60.2
GSM8K	76.3

Coding and Multilingual Tasks

Dataset	Performance
HumanEval	40.5
HumanEval+	34.6
MBPP	55.0
MBPP+	47.5
MultiPL-E	39.6
Multi-Exam	52.6
Multi-Understanding	73.9
Multi-Mathematics	47.1
Multi-Translation	28.2

Usage

Generate Text

import torch
from transformers import AutoTokenizer, BrainGPTForCausalLM

model_path = "/path/to/your/model"
model = BrainGPTForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def generate_text(messages, max_new_tokens=50):
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    
    with torch.no_grad():
        generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
    
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    return tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

# Example usage
messages = [
    {"role": "system", "content": "You are a knowledgeable assistant."},
    {"role": "user", "content": "Explain the Pythagorean theorem."}
]
response = generate_text(messages)
print(response)

Acknowledgments

The model was trained using ANN-Base-Qwen2, with a total of three training stages, including SNN-specific neuron synaptic plasticity training. The technical report is still being prepared. Please note that SNN models do not support ANN fine-tuning techniques. We are currently developing specialized fine-tuning code tools for SNN models. Our open-source model has achieved leading SOTA results, and we welcome your stars.

This repository includes a complete transformers package, which can directly replace the transformers package in your development environment. This allows compatibility with our SNN-Base-LLM without affecting existing usage.