SpatialBot-3B-LoRA / README.md
RussRobin's picture
Update README.md
9b8fabf verified
metadata
license: cc-by-4.0
datasets:
  - RussRobin/SpatialQA
language:
  - en
tags:
  - Embodied AI
  - MLLM
  - VLM
  - Spatial Understanding
  - Phi-2
pipeline_tag: visual-question-answering

SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks.

In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench.

You will also need to download pretrained CKPT.

Paper:

https://arxiv.org/abs/2406.13642

GitHub repo:

https://github.com/BAAI-DCAI/SpatialBot

SpatialBench, the benchmark:

https://huggingface.co/datasets/RussRobin/SpatialBench

Merged SpatialBot-3B:

https://huggingface.co/RussRobin/SpatialBot-3B