Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 6 days ago • 306
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 11 days ago • 32
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 13 days ago • 193
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published 20 days ago • 50
LLaVA-OneVision Collection a model good at arbitrary types of visual input • 15 items • Updated about 7 hours ago • 19
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 107
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published Aug 29 • 50
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 8 items • Updated about 12 hours ago • 54
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes tiny, tiny2, small, base, large and large2 variants. • 8 items • Updated about 12 hours ago • 12
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 174
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published Jun 28 • 59
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 84
view article Article Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap Jun 19 • 11
view article Article XLSCOUT Unveils ParaEmbed 2.0: a Powerful Embedding Model Tailored for Patents and IP with Expert Support from Hugging Face Jun 25 • 10
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 136
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated Aug 30 • 161
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior Paper • 2404.10198 • Published Apr 16 • 7
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 78
2D Gaussian Splatting for Geometrically Accurate Radiance Fields Paper • 2403.17888 • Published Mar 26 • 26
RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS Paper • 2403.13806 • Published Mar 20 • 18
ChatMusician: Understanding and Generating Music Intrinsically with LLM Paper • 2402.16153 • Published Feb 25 • 55
Lumos : Empowering Multimodal LLMs with Scene Text Recognition Paper • 2402.08017 • Published Feb 12 • 24
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7 • 19
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis Paper • 2312.16812 • Published Dec 28, 2023 • 9
Foundation Models for Generalist Geospatial Artificial Intelligence Paper • 2310.18660 • Published Oct 28, 2023 • 8
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation Paper • 2309.16653 • Published Sep 28, 2023 • 45
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 44
MTVG : Multi-text Video Generation with Text-to-Video Models Paper • 2312.04086 • Published Dec 7, 2023 • 1
VideoBooth: Diffusion-based Video Generation with Image Prompts Paper • 2312.00777 • Published Dec 1, 2023 • 20
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning Paper • 2311.10709 • Published Nov 17, 2023 • 24
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs Paper • 2311.14656 • Published Nov 24, 2023 • 2
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians Paper • 2312.03029 • Published Dec 5, 2023 • 23
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion Paper • 2311.07885 • Published Nov 14, 2023 • 39