-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 135 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 2 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 69
Collections
Discover the best community collections!
Collections including paper arxiv:2403.08295
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 13 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 1 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136
-
Attention Is All You Need
Paper • 1706.03762 • Published • 44 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 14 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 11
-
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 47 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 62 -
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 39 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 13
-
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction
Paper • 1811.00062 • Published • 2 -
mT5: A massively multilingual pre-trained text-to-text transformer
Paper • 2010.11934 • Published • 4 -
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Paper • 2310.10021 • Published • 2 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 47
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 60 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 47 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 49 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 41
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 41 -
Qwen Technical Report
Paper • 2309.16609 • Published • 34 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 47