Arthur Zucker's picture

Arthur Zucker

ArthurZ

·

AI & ML interests

None yet

Recent Activity

reacted to Xenova's post with 🔥 about 17 hours ago

reacted to davidberenstein1957's post with 👀 about 17 hours ago

reacted to LukeNeumann's post with 🤯 about 17 hours ago

Articles

Fixing Gradient Accumulation

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Fine-Tuning Gemma Models in Hugging Face

Code Llama: Llama 2 learns to code

Organizations

ArthurZ's activity

upvoted a collection about 2 months ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated 7 days ago • 271

upvoted an article 2 months ago

Article

Don't repeat yourself - 🤗 Transformers Design Philosophy

Apr 5, 2022

• 12

upvoted an article 3 months ago

Article

XetHub is joining Hugging Face!

Aug 8

• 80

upvoted an article 6 months ago

Article

Space secrets security update

May 31

• 50

upvoted 3 collections 7 months ago

OpenELM Instruct Models

4 items • Updated Oct 4 • 113

OpenELM Pretrained Models

4 items • Updated Oct 4 • 47

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683

upvoted a collection 10 months ago

Qwen1.5

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Sep 18 • 206

upvoted a paper 10 months ago

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24 • 51

upvoted a collection 10 months ago

Canonical models

This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13

upvoted a collection 11 months ago

Switch-Transformers release

This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 15

upvoted a paper 11 months ago

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 258

upvoted a paper 12 months ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 183

upvoted a paper about 1 year ago

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Paper • 2301.13688 • Published Jan 31, 2023 • 8