Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published 6 days ago • 19
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published 6 days ago • 19
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published 23 days ago • 24
view post Post 3158 Reply 🔥🔥🔥Introducing Oryx-1.5!A series of unified MLLMs with much stronger performance on all the image, video, and 3D benchmarks 😍🛠️Github: https://github.com/Oryx-mllm/Oryx🚀Model: THUdyh/oryx-15-6718c60763845525c2bba71d🎨Demo: THUdyh/Oryx👋Try the top-tier MLLM yourself!👀Stay tuned for more explorations on MLLMs! 🔥 12 12 +
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 103
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19 • 24
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation Paper • 2409.03755 • Published Sep 5 • 3
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19 • 24
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1 • 21
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25 • 16
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25 • 16