Edit model card

image/png

Model Card for Aether-Qwen2-0.5B-SFT-v0.0.2

This model is an iteration of the Qwen2 model, fine-tuned using Supervised Fine-Tuning (SFT) on the AetherCode-v1 dataset specifically for code-related tasks. It combines the advanced capabilities of the base Qwen2 model with specialized training to enhance its performance in software development contexts.

Model Details

Model Description

Aether-Qwen2-0.5B-SFT-v0.0.1 is a transformer model from the Hugging Face 🤗 transformers library, designed to facilitate and improve automated coding tasks. This model has been enhanced via Supervised Fine-Tuning (SFT) to better understand and generate code, making it ideal for applications in software development, code review, and automated programming assistance.

  • Developed by: Michael Svendsen
  • Finetuned from model: Qwen2 0.5B

Uses

Direct Use

This model is ready for direct use in environments where coding assistance is needed, providing capabilities such as code completion, error detection, and suggestions for code optimization.

Downstream Use [optional]

Further fine-tuning on specific coding languages or frameworks can extend its utility to more specialized software development tasks.

Out-of-Scope Use

The model should not be used for general natural language processing tasks outside the scope of programming and code analysis.

Bias, Risks, and Limitations

Users should be cautious about relying solely on the model for critical software development tasks without human oversight, due to potential biases in training data or limitations in understanding complex code contexts.

Recommendations

Ongoing validation and testing on diverse coding datasets are recommended to ensure the model remains effective and unbiased.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModel

model = AutoModel.from_pretrained("thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")

or with a pipeline:

from transformers import pipeline

messages = [
    {"role": "system", "content": "You are a helpful software development assistant"},
    {"role": "user", "content": "can you write a python function that adds 3 numbers together?"},
]
pipe = pipeline("text-generation", model="thesven/Aether-Qwen2-0.5B-SFT-v0.0.2")
print(pipe(messages))

Prompt Template:

<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{user}<|im_end|>
<|im_start|>assistant
{assistant}

Training Details

Training Data

The model was trained using the 5star split from the AetherCode-v1 dataset, designed for enhancing coding-related AI capabilities.

Training Procedure

Training regime: The model was trained for 3 epochs on an RTX 4500 using Supervised Fine-Tuning (SFT)

Preprocessing [optional]

Standard preprocessing techniques were applied to prepare the code data for training.

Downloads last month
11
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for thesven/Aether-Qwen2-0.5B-SFT-v0.0.2

Quantizations
1 model

Dataset used to train thesven/Aether-Qwen2-0.5B-SFT-v0.0.2

Collection including thesven/Aether-Qwen2-0.5B-SFT-v0.0.2