stereoplegic
's Collections
UI Layout Generation with LLMs Guided by UI Grammar
Paper
•
2310.15455
•
Published
•
2
You Only Look at Screens: Multimodal Chain-of-Action Agents
Paper
•
2309.11436
•
Published
•
1
Never-ending Learning of User Interfaces
Paper
•
2308.08726
•
Published
•
1
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper
•
2309.08172
•
Published
•
11
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
Models
Paper
•
2309.09506
•
Published
•
14
DSG: An End-to-End Document Structure Generator
Paper
•
2310.09118
•
Published
•
2
On Web-based Visual Corpus Construction for Visual Document
Understanding
Paper
•
2211.03256
•
Published
•
1
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
•
2309.01131
•
Published
•
1
DocFormerv2: Local Features for Document Understanding
Paper
•
2306.01733
•
Published
•
1
OCR-free Document Understanding Transformer
Paper
•
2111.15664
•
Published
•
2
DocParser: End-to-end OCR-free Information Extraction from Visually Rich
Documents
Paper
•
2304.12484
•
Published
•
1
Understanding HTML with Large Language Models
Paper
•
2210.03945
•
Published
•
1
Leveraging Large Language Models for Scalable Vector Graphics-Driven
Image Understanding
Paper
•
2306.06094
•
Published
•
1
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
180
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper
•
2401.02823
•
Published
•
34
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for
Multi-modal Large Language Models
Paper
•
2311.07575
•
Published
•
13
LayoutPrompter: Awaken the Design Ability of Large Language Models
Paper
•
2311.06495
•
Published
•
10
Viewer
•
Updated
•
2.75M
•
8.95k
•
329
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in
Large Multimodal Models
Paper
•
2401.13311
•
Published
•
10
Empowering LLM to use Smartphone for Intelligent Task Automation
Paper
•
2308.15272
•
Published
•
1
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
•
2404.12753
•
Published
•
41