Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Paper • 2409.20537 • Published 5 days ago • 11
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 16 days ago • 128
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization Paper • 2408.15914 • Published Aug 28 • 21
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 9 items • Updated 3 days ago • 54
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19 • 43
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2 • 21
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 76
Graph-enhanced RAG Collection using knowledge graphs in RAG for grounding LLM results • 22 items • Updated 6 days ago • 7
Roleplay, Creative Writing, Uncensored, NSFW Collection Newest models at bottom of the page. Please review the org model card for special instruction(s) / Template(s) / Tips for use. • 256 items • Updated about 15 hours ago • 44
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30 • 29
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16 • 26
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1 • 30
view article Article Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU! By lyogavin • Apr 21 • 41
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18 • 20
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design Paper • 2401.14112 • Published Jan 25 • 17
Generative Agents: Interactive Simulacra of Human Behavior Paper • 2304.03442 • Published Apr 7, 2023 • 11
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 64
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 40
CLEX: Continuous Length Extrapolation for Large Language Models Paper • 2310.16450 • Published Oct 25, 2023 • 9
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models Paper • 2308.07922 • Published Aug 15, 2023 • 17
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 80
Extending Context Window of Large Language Models via Positional Interpolation Paper • 2306.15595 • Published Jun 27, 2023 • 53
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Paper • 2306.02707 • Published Jun 5, 2023 • 46