vilm
/

Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

VyLinh-Lite: Vietnamese 3B Reasoning Language Model

Model Details

  • Language(s): Vietnamese
  • Base Model: Qwen2.5-3B
  • Model Size: 3 billion parameters

Intended Use

  • Primary intended uses: Vietnamese language understanding, reasoning, and generation
  • Primary intended users: Researchers, developers, and practitioners working with Vietnamese language AI
  • Out-of-scope use cases: Production deployments without additional safety measures

Training Details

Training Data

The model underwent a sophisticated training process involving multiple stages of distillation and adaptation:

  1. Initial knowledge distillation from Llama 3.1 405B
  2. Architecture adaptation using mergekit-tokensurgeon
  3. Secondary distillation to Qwen architecture
  4. Parallel distillation from Qwen2-72B
  5. Final fusion and fine-tuning using EvolKit dataset

Training Procedure

Distillation Process

  1. Logit Distillation

    • Source: Llama 3.1 405B
    • Method: Offline distillation
    • Storage: Top-K logits preservation
  2. Cross-Architecture Adaptation

    • Tool: mergekit-tokensurgeon
    • Process: Vocabulary alignment with Llama 3.1 405B
  3. Architecture Transformation

    • Target: 3B parameter configuration
    • Method: Progressive knowledge transfer

Fine-tuning

  • Final Stage: EvolKit dataset utilization
  • Optimization: Focus on coherence and reasoning capabilities
  • Vocabulary: Qwen-native vocabulary restoration

Performance and Limitations

Benchmarks

Will be updated throughout the day

Limitations

  • Model size constraints may impact certain complex reasoning tasks
  • Performance may vary on domain-specific Vietnamese content
  • Limited context window compared to larger models

Ethical Considerations

  • Data Bias: May reflect biases present in training data
  • Environmental Impact: Reduced compared to larger models due to efficient distillation
  • Societal Impact: Potential influence on Vietnamese language technology landscape

Technical Specifications

  • Parameter Count: 3 billion
  • Context Window: 32K
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .