rsortino
/

trf-sg2im

Inference Endpoints

Model card Files Files and versions Community

trf-sg2im / README.md

rsortino's picture

Update README.md

a473bed verified 10 months ago

|

988 Bytes

	---
	license: apache-2.0
	datasets:
	- multi-train/coco_captions_1107
	- visual_genome
	language:
	- en
	pipeline_tag: text-to-image
	tags:
	- scene_graph
	- transformers
	- laplacian
	- autoregressive
	- vqvae
	---

	# trf-sg2im

	Model card for the paper __"[Transformer-Based Image Generation from Scene Graphs](https://arxiv.org/abs/2303.04634)"__.
	Original GitHub implementation [here](https://github.com/perceivelab/trf-sg2im).

	![teaser](docs/teaser.gif)

	## Model

	This model is a two-stage scene-graph-to-image approach. It takes a scene graph as input and generates a layout using a transformer-based architecture with Laplacian Positional Encoding.
	Then, it uses this estimated layout to condition an autoregressive GPT-like transformer to compose the image in the latent, discrete space, converted into the final image by a VQVAE.

	![architecture](docs/architecture.png)

	## Usage
	For usage instructions, please refer to the original [GitHub repo](https://github.com/perceivelab/trf-sg2im).