VyLinh-Lite: Vietnamese 3B Reasoning Language Model
Model Details
- Language(s): Vietnamese
- Base Model: Qwen2.5-3B
- Model Size: 3 billion parameters
Intended Use
- Primary intended uses: Vietnamese language understanding, reasoning, and generation
- Primary intended users: Researchers, developers, and practitioners working with Vietnamese language AI
- Out-of-scope use cases: Production deployments without additional safety measures
Training Details
Training Data
The model underwent a sophisticated training process involving multiple stages of distillation and adaptation:
- Initial knowledge distillation from Llama 3.1 405B
- Architecture adaptation using mergekit-tokensurgeon
- Secondary distillation to Qwen architecture
- Parallel distillation from Qwen2-72B
- Final fusion and fine-tuning using EvolKit dataset
Training Procedure
Distillation Process
Logit Distillation
- Source: Llama 3.1 405B
- Method: Offline distillation
- Storage: Top-K logits preservation
Cross-Architecture Adaptation
- Tool: mergekit-tokensurgeon
- Process: Vocabulary alignment with Llama 3.1 405B
Architecture Transformation
- Target: 3B parameter configuration
- Method: Progressive knowledge transfer
Fine-tuning
- Final Stage: EvolKit dataset utilization
- Optimization: Focus on coherence and reasoning capabilities
- Vocabulary: Qwen-native vocabulary restoration
Performance and Limitations
Benchmarks
Will be updated throughout the day
Limitations
- Model size constraints may impact certain complex reasoning tasks
- Performance may vary on domain-specific Vietnamese content
- Limited context window compared to larger models
Ethical Considerations
- Data Bias: May reflect biases present in training data
- Environmental Impact: Reduced compared to larger models due to efficient distillation
- Societal Impact: Potential influence on Vietnamese language technology landscape
Technical Specifications
- Parameter Count: 3 billion
- Context Window: 32K