ChenDRAG commited on
Commit
7e2eb48
1 Parent(s): 25722db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -4,6 +4,10 @@ base_model:
4
  - FoundationVision/LlamaGen
5
  ---
6
 
 
 
 
 
7
  # Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
8
 
9
  (TL;DR) We propose CCA as a finetuning technique for AR visual models so that they can generate high-quality images without CFG, cutting sampling costs by half. CCA and CFG have the same theoretical foundations and thus similar features, though CCA is inspired from LLM alignment instead of guided sampling.
@@ -14,6 +18,4 @@ Features of CCA:
14
  * **Fast to train.** CCA requires only finetuning pretrained models for 1 epoch to achieve ideal performance (~1% computation of pretraining).
15
  * **Consistency with LLM Alignment.** CCA is theoretically foundationed on existing [LLM alignment methods](https://arxiv.org/abs/2402.05369), and bridges the gap between visual-targeted guidance and language-targeted alignment, offering a unified framework for mixed-modal modeling.
16
 
17
- Github: https://github.com/thu-ml/CCA/tree/main
18
 
19
- Paper:arxiv.org/abs/2410.09347
 
4
  - FoundationVision/LlamaGen
5
  ---
6
 
7
+ Paper:arxiv.org/abs/2410.09347
8
+
9
+ Github: https://github.com/thu-ml/CCA/tree/main
10
+
11
  # Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
12
 
13
  (TL;DR) We propose CCA as a finetuning technique for AR visual models so that they can generate high-quality images without CFG, cutting sampling costs by half. CCA and CFG have the same theoretical foundations and thus similar features, though CCA is inspired from LLM alignment instead of guided sampling.
 
18
  * **Fast to train.** CCA requires only finetuning pretrained models for 1 epoch to achieve ideal performance (~1% computation of pretraining).
19
  * **Consistency with LLM Alignment.** CCA is theoretically foundationed on existing [LLM alignment methods](https://arxiv.org/abs/2402.05369), and bridges the gap between visual-targeted guidance and language-targeted alignment, offering a unified framework for mixed-modal modeling.
20
 
 
21