罗杰斯's picture

罗杰斯

rojasdiego

·

https://rojasdiego.com

AI & ML interests

LLMs for Code Generation

Recent Activity

upvoted a paper 16 days ago

liked a model 16 days ago

tencent/Tencent-Hunyuan-Large

liked a model 19 days ago

deepseek-ai/DeepSeek-V2

Organizations

rojasdiego's activity

upvoted a paper 16 days ago

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published 17 days ago • 36

upvoted a paper 28 days ago

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published 28 days ago • 16

upvoted a paper about 2 months ago

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Paper • 2410.02367 • Published Oct 3 • 47

upvoted a collection about 2 months ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 28 days ago • 483

upvoted 3 papers 3 months ago

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published Sep 5 • 88

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7 • 22

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18 • 19

upvoted 3 collections 3 months ago

SOTA Code LLMs

Top LLMs for code. Instruct variants. Fit on single A100. • 5 items • Updated Sep 2 • 1

Arctic-embed

A collection of text embedding models optimized for retrieval accuracy and efficiency • 6 items • Updated Jul 18 • 14

MoEs papers reading list

60 items • Updated 17 days ago • 135

upvoted 3 papers 3 months ago

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29 • 47

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62

upvoted 2 papers 5 months ago

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22 • 45

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Paper • 2406.12793 • Published Jun 18 • 31

upvoted 2 papers 9 months ago

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23 • 34

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 603

upvoted a collection 10 months ago

Stable Code

Suite of developer assistant models • 5 items • Updated Apr 8 • 39

upvoted a paper 11 months ago

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Paper • 2401.00448 • Published Dec 31, 2023 • 28

upvoted a paper 12 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138