InLoc: Indoor Visual Localization with Dense Matching and View Synthesis Paper • 1803.10368 • Published Mar 28, 2018 • 1
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 10 • 1
Neural SLAM: Learning to Explore with External Memory Paper • 1706.09520 • Published Jun 29, 2017 • 1
SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving Paper • 2402.02519 • Published Feb 4 • 1
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation Paper • 2307.09906 • Published Jul 19, 2023 • 1
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI Paper • 2109.08238 • Published Sep 16, 2021 • 1
Optimal Transport Aggregation for Visual Place Recognition Paper • 2311.15937 • Published Nov 27, 2023 • 1
RoMa: Revisiting Robust Losses for Dense Feature Matching Paper • 2305.15404 • Published May 24, 2023 • 1
Localizing Objects with Self-Supervised Transformers and no Labels Paper • 2109.14279 • Published Sep 29, 2021 • 2
Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL Paper • 2006.13799 • Published Jun 24, 2020 • 1
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models Paper • 2311.00871 • Published Nov 1, 2023 • 2 • 1
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Paper • 1706.02413 • Published Jun 7, 2017 • 1
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework Paper • 2308.08155 • Published Aug 16, 2023 • 3 • 2
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild Paper • 2008.10010 • Published Aug 23, 2020 • 1
Visual Geo-localization with Self-supervised Representation Learning Paper • 2308.00090 • Published Jul 31, 2023 • 3
PaLI: A Jointly-Scaled Multilingual Language-Image Model Paper • 2209.06794 • Published Sep 14, 2022 • 2 • 1
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Paper • 1612.00593 • Published Dec 2, 2016 • 1
Open X-Embodiment: Robotic Learning Datasets and RT-X Models Paper • 2310.08864 • Published Oct 13, 2023 • 2 • 1
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Paper • 2308.06595 • Published Aug 12, 2023 • 5 • 1
Global Features are All You Need for Image Retrieval and Reranking Paper • 2308.06954 • Published Aug 14, 2023 • 1 • 1
CodePlan: Repository-level Coding using LLMs and Planning Paper • 2309.12499 • Published Sep 21, 2023 • 73 • 14
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots Paper • 2310.13724 • Published Oct 19, 2023 • 8 • 3
Towards Foundation Models for Knowledge Graph Reasoning Paper • 2310.04562 • Published Oct 6, 2023 • 3 • 1
IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes Paper • 2210.12878 • Published Oct 23, 2022 • 1
PlaceNav: Topological Navigation through Place Recognition Paper • 2309.17260 • Published Sep 29, 2023 • 1
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23, 2023 • 22 • 4
CodeT5+: Open Code Large Language Models for Code Understanding and Generation Paper • 2305.07922 • Published May 13, 2023 • 4 • 2
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper • 2310.17680 • Published Oct 26, 2023 • 69 • 10
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning Paper • 2310.10103 • Published Oct 16, 2023 • 1
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age Paper • 1606.05830 • Published Jun 19, 2016 • 1
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Paper • 2307.15818 • Published Jul 28, 2023 • 27 • 2
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ Paper • 2203.17189 • Published Mar 31, 2022 • 1 • 1
SCENIC: A JAX Library for Computer Vision Research and Beyond Paper • 2110.11403 • Published Oct 18, 2021 • 1
Project Aria: A New Tool for Egocentric Multi-Modal AI Research Paper • 2308.13561 • Published Aug 24, 2023 • 1
Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding Paper • 2309.15065 • Published Sep 26, 2023 • 1
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation Paper • 2308.11596 • Published Aug 22, 2023 • 1 • 1
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos Paper • 2207.11094 • Published Jul 22, 2022 • 1 • 1
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 4 • 1
PointPillars: Fast Encoders for Object Detection from Point Clouds Paper • 1812.05784 • Published Dec 14, 2018 • 1
Graph of Thoughts: Solving Elaborate Problems with Large Language Models Paper • 2308.09687 • Published Aug 18, 2023 • 6 • 1
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera Paper • 1910.02527 • Published Oct 6, 2019 • 1