piotr-ai (Piotr)

upvoted a paper 5 days ago

The AdEMAMix Optimizer: Better, Faster, Older

Paper • 2409.03137 • Published 28 days ago • 5

upvoted a collection 5 days ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 7 days ago • 311

upvoted a collection 7 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 7 days ago • 206

upvoted 2 collections 14 days ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 14 days ago • 195

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 14 days ago • 216

upvoted a collection 20 days ago

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated 20 days ago • 76

upvoted a collection 24 days ago

Power-LM

Collection

Dense & MoE LLMs trained with power learning rate scheduler. • 3 items • Updated 21 days ago • 14

upvoted a paper 28 days ago

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published about 1 month ago • 26

upvoted a collection 28 days ago

Yi-Coder

Collection

4 items • Updated 28 days ago • 29

upvoted a collection about 1 month ago

CogVLM2

Collection

This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated Aug 18 • 17

upvoted a paper about 1 month ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

upvoted a collection about 1 month ago

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 15 items • Updated 14 days ago • 129

upvoted 4 papers about 1 month ago

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15 • 51

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published May 23 • 32

upvoted 3 collections about 2 months ago

upvoted 2 papers about 2 months ago

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

upvoted a collection about 2 months ago

Qwen2

Collection

Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 14 days ago • 339

upvoted a paper about 2 months ago

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Paper • 2408.05147 • Published Aug 9 • 37

upvoted an article about 2 months ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20

• 61

upvoted 2 collections about 2 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 136

LLaVA-OneVision

Collection

a model good at arbitrary types of visual input • 15 items • Updated 1 day ago • 19

upvoted 2 papers about 2 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 53

upvoted 2 collections about 2 months ago

Qwen2-Audio

Collection

Audio-language model series based on Qwen2 • 4 items • Updated 14 days ago • 41

Qwen2-Math

Collection

Math-specific model series based on Qwen2 • 8 items • Updated 14 days ago • 44

upvoted 2 papers about 2 months ago

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5 • 37

VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5 • 13

upvoted a collection about 2 months ago

Gemma 2 Release

Collection

15 items • Updated 23 days ago • 174

upvoted 2 collections 2 months ago

🍃 MINT-1T

Collection

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 50

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated 7 days ago • 585

upvoted an article 3 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 244

upvoted 5 papers 3 months ago

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16 • 25

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 153

H2O-Danube3 Technical Report

Paper • 2407.09276 • Published Jul 12 • 18

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 93

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 84

upvoted a paper 4 months ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published May 24 • 43

upvoted 2 collections 4 months ago

Aya Datasets

Collection

The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jun 28 • 12

C4AI Aya 23

Collection

Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 4 items • Updated Aug 6 • 45

upvoted a paper 4 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

upvoted a collection 4 months ago

Phi-3

Collection

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated 14 days ago • 471

upvoted 5 papers 5 months ago

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published May 15 • 18

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 73

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Paper • 2404.14408 • Published Apr 22 • 6

upvoted 2 papers 6 months ago

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 33

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103

upvoted a paper 7 months ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592

upvoted 2 papers 8 months ago

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 65

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 141

upvoted a paper 9 months ago

Masked Audio Generation using a Single Non-Autoregressive Transformer

Paper • 2401.04577 • Published Jan 9 • 41

upvoted 2 papers 11 months ago

Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

upvoted a paper about 1 year ago

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55

Piotr

AI & ML interests

Organizations

piotr-ai's activity

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

SmolLM - blazingly fast and remarkably powerful