ahn1376 commited on
Commit
fefda04
1 Parent(s): cce0b24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -3
README.md CHANGED
@@ -1,3 +1,38 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ This model has been trained to generate bike images. The dataset used is BIKED++ (found here in [GitHub](https://github.com/Lyleregenwetter/BIKED_multimodal/tree/main?utm_source=catalyzex.com))
5
+
6
+ This is a conditional diffusion model which is trained to both generate bike images and perform infill on partially masked bike images from the dataset above.
7
+
8
+ the baseline architecture was setup with diffusers UNet2DModel model for images:
9
+
10
+ ```python
11
+ UNet2DModel(
12
+ sample_size=128, # the target image resolution is set to 128 here (128x128 images)
13
+ in_channels=6, # the number of input channels, 3 for RGB masked images(infill, feed all white for uncoditional) and 3 for RGB noise
14
+ out_channels=3, # the number of output channels (RGB)
15
+ layers_per_block=2,
16
+ block_out_channels=(128, 256, 512, 768),
17
+ down_block_types=(
18
+ "DownBlock2D",
19
+ "AttnDownBlock2D",
20
+ "AttnDownBlock2D",
21
+ "AttnDownBlock2D"
22
+ ),
23
+ up_block_types=(
24
+ "AttnUpBlock2D",
25
+ "AttnUpBlock2D",
26
+ "AttnUpBlock2D",
27
+ "UpBlock2D"
28
+ ),
29
+ )
30
+ ```
31
+
32
+ The code for training and inference is included as well. There exists a postprocessing function which crops the white space above and below the bike images which the dataset was preprocessed to have such that the images become square shaped.
33
+
34
+ With only 10 denoising steps you can get uncoditional samples that mimic the dataset well.
35
+
36
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d516ba80d47a6b76fc1015/VDqGqLee0XvqtiYYkzpia.png)
37
+
38
+ A pipeline with guidance is provided as well where you can feed your custom function for guidance.