js's picture

js

rldy

·

AI & ML interests

None yet

Organizations

rldy's activity

upvoted a paper 9 days ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published 12 days ago • 105

upvoted a paper 18 days ago

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126

upvoted a paper 19 days ago

Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse

Paper • 2410.21333 • Published 23 days ago • 9

upvoted a paper 21 days ago

Counting Ability of Large Language Models and Impact of Tokenization

Paper • 2410.19730 • Published 25 days ago • 10

upvoted a paper 24 days ago

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Paper • 2410.14669 • Published Oct 18 • 35

upvoted 3 papers 25 days ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published 28 days ago • 88

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published 26 days ago • 40

Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published 26 days ago • 16

upvoted 2 papers about 1 month ago

Large Language Model Evaluation via Matrix Nuclear-Norm

Paper • 2410.10672 • Published Oct 14 • 18

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15 • 16

upvoted a collection about 1 month ago

LLM Reasoning Papers

Papers to improve reasoning capabilities of LLMs • 15 items • Updated 17 days ago • 75

upvoted 3 papers about 1 month ago

Thinking LLMs: General Instruction Following with Thought Generation

Paper • 2410.10630 • Published Oct 14 • 16

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Paper • 2410.07985 • Published Oct 10 • 26

Think While You Generate: Discrete Diffusion with Planned Denoising

Paper • 2410.06264 • Published Oct 8 • 9

upvoted 6 papers about 2 months ago

Large Language Models as Markov Chains

Paper • 2410.02724 • Published Oct 3 • 31

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2 • 30

Visual Context Window Extension: A New Perspective for Long Video Understanding

Paper • 2409.20018 • Published Sep 30 • 8

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Paper • 2409.19603 • Published Sep 29 • 17

Hyper-Connections

Paper • 2409.19606 • Published Sep 29 • 20

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 91