hitchhiker3010
's Collections
to_read
updated
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper
•
2401.02823
•
Published
•
34
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
•
2401.02038
•
Published
•
61
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
180
Attention Where It Matters: Rethinking Visual Document Understanding
with Selective Region Concentration
Paper
•
2309.01131
•
Published
•
1
LMDX: Language Model-based Document Information Extraction and
Localization
Paper
•
2309.10952
•
Published
•
65
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
124
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal
Large Language Models
Paper
•
2403.13447
•
Published
•
18
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
36
Unifying Vision, Text, and Layout for Universal Document Processing
Paper
•
2212.02623
•
Published
•
10
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Paper
•
2406.15334
•
Published
•
8
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in
Large Language Models Using Only Attention Maps
Paper
•
2407.07071
•
Published
•
11
Transformer Layers as Painters
Paper
•
2407.09298
•
Published
•
13
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity
Text Embeddings Through Self-Knowledge Distillation
Paper
•
2402.03216
•
Published
•
4
Visual Text Generation in the Wild
Paper
•
2407.14138
•
Published
•
8
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document
Understanding
Paper
•
2407.12594
•
Published
•
19
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
•
2408.06292
•
Published
•
115
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
115
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
•
2408.14906
•
Published
•
138
Becoming self-instruct: introducing early stopping criteria for minimal
instruct tuning
Paper
•
2307.03692
•
Published
•
24
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
Resolution
Paper
•
2307.06304
•
Published
•
26
Contrastive Localized Language-Image Pre-Training
Paper
•
2410.02746
•
Published
•
31
Interpreting and Editing Vision-Language Representations to Mitigate
Hallucinations
Paper
•
2410.02762
•
Published
•
9
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference
Acceleration
Paper
•
2410.02367
•
Published
•
45
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
Paper
•
2410.01731
•
Published
•
15
Contextual Document Embeddings
Paper
•
2410.02525
•
Published
•
16
Compact Language Models via Pruning and Knowledge Distillation
Paper
•
2407.14679
•
Published
•
37
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
•
2408.11796
•
Published
•
53
Code Generation with AlphaCodium: From Prompt Engineering to Flow
Engineering
Paper
•
2401.08500
•
Published
•
5
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
Paper
•
2305.03495
•
Published
•
1
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper
•
2312.10003
•
Published
•
35