Spaces:
Sleeping
Sleeping
serdaryildiz
commited on
Commit
•
785ceb0
1
Parent(s):
dfd33e5
bug fixed!
Browse files
README.md
CHANGED
@@ -1,105 +1,7 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
<p align="center">
|
10 |
-
<br />
|
11 |
-
<br />
|
12 |
-
<a href='https://journals.tubitak.gov.tr/elektrik'><img src='https://img.shields.io/badge/Paper-TUBITAK-red'></a>
|
13 |
-
<a href='https://huggingface.co/spaces/'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
|
14 |
-
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg"></a>
|
15 |
-
</p>
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
## Abstract
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
## Installation
|
28 |
-
|
29 |
-
This project was developed on `torch 2.0.0 CUDA 11.8` and `Python 3.10`.
|
30 |
-
|
31 |
-
|
32 |
-
git clone https://github.com/serdaryildiz/TRCaptionNet.git
|
33 |
-
python3.10 -m venv venv
|
34 |
-
source venv/bin/activate
|
35 |
-
pip install -r requirements.txt
|
36 |
-
|
37 |
-
|
38 |
-
## Dataset
|
39 |
-
|
40 |
-
For the COCO dataset, please visit the [TurkishCaptionSet-COCO](https://github.com/serdaryildiz/TurkishCaptionSet-COCO) repository.
|
41 |
-
|
42 |
-
For the Flickr30k dataset : [Flicker30k-Turkish](https://drive.google.com/)
|
43 |
-
|
44 |
-
## Checkpoint
|
45 |
-
|
46 |
-
### COCO-Test
|
47 |
-
|
48 |
-
| Model | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-5 | METEOR | ROUGE-L | CIDEr |
|
49 |
-
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|--------|--------|--------|--------|---------|--------|
|
50 |
-
| **CLIP ViT-B/16 + (no pretrain)** | 0.5069 | 0.3438 | 0.2190 | 0.1416 | 0.2221 | 0.4127 | 0.4934 |
|
51 |
-
| **CLIP ViT-B/32 + (no pretrain)** | 0.4795 | 0.3220 | 0.2056 | 0.1328 | 0.2157 | 0.4065 | 0.4512 |
|
52 |
-
| **CLIP ViT-L/14 + (no pretrain)** | 0.5262 | 0.3643 | 0.2367 | 0.1534 | 0.2290 | 0.4296 | 0.5209 |
|
53 |
-
| **CLIP ViT-L/14@336px + (no pretrain)** | 0.5325 | 0.3693 | 0.2376 | 0.1528 | 0.2338 | 0.4387 | 0.5288 |
|
54 |
-
| **ViT-B/16 + BERTurk** | 0.5572 | 0.3945 | 0.2670 | 0.1814 | 0.2459 | 0.4499 | 0.6146 |
|
55 |
-
| **CLIP ViT-B/16 + (BERTurk)** | 0.5412 | 0.3802 | 0.2555 | 0.1715 | 0.2387 | 0.4419 | 0.5848 |
|
56 |
-
| [**CLIP ViT-L/14 + (BERTurk)**](https://drive.google.com/u/0/uc?id=14Ll1PIQhsMSypHT34Rt9voz_zaAf4Xh9&export=download&confirm=t&uuid=9b4bf589-d438-4b4f-a37c-fc34b0a63a5d&at=AB6BwCAY8xK0EZiPGv2YT7isL8pG:1697575816291) | 0.5761 | 0.4124 | 0.2803 | 0.1905 | 0.2523 | 0.4609 | 0.6437 |
|
57 |
-
| **CLIP ViT-L/14@336px + (BERTurk)** | 0.4639 | 0.3198 | 0.2077 | 0.1346 | 0.2276 | 0.4190 | 0.4971 |
|
58 |
-
|
59 |
-
|
60 |
-
### Flickr-Test
|
61 |
-
|
62 |
-
| Model | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-5 | METEOR | ROUGE-L | CIDEr |
|
63 |
-
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|--------|--------|--------|--------|---------|--------|
|
64 |
-
| **CLIP ViT-B/16 + (no pretrain)** | 0.4754 | 0.2980 | 0.1801 | 0.1046 | 0.1902 | 0.3732 | 0.2907 |
|
65 |
-
| **CLIP ViT-B/32 + (no pretrain)** | 0.4581 | 0.2866 | 0.1742 | 0.1014 | 0.1855 | 0.3754 | 0.2659 |
|
66 |
-
| **CLIP ViT-L/14 + (no pretrain)** | 0.5186 | 0.3407 | 0.2184 | 0.1346 | 0.2045 | 0.4058 | 0.3507 |
|
67 |
-
| **CLIP ViT-L/14@336px + (no pretrain)** | 0.5259 | 0.3525 | 0.2249 | 0.1334 | 0.2157 | 0.4237 | 0.3808 |
|
68 |
-
| **ViT-B/16 + BERTurk** | 0.5400 | 0.3742 | 0.2533 | 0.1677 | 0.2232 | 0.4324 | 0.4636 |
|
69 |
-
| **CLIP ViT-B/16 + (BERTurk)** | 0.5182 | 0.3523 | 0.2348 | 0.1532 | 0.2105 | 0.4079 | 0.4010 |
|
70 |
-
| [**CLIP ViT-L/14 + (BERTurk)**](https://drive.google.com/u/0/uc?id=14Ll1PIQhsMSypHT34Rt9voz_zaAf4Xh9&export=download&confirm=t&uuid=9b4bf589-d438-4b4f-a37c-fc34b0a63a5d&at=AB6BwCAY8xK0EZiPGv2YT7isL8pG:1697575816291) | 0.5713 | 0.4056 | 0.2789 | 0.1843 | 0.2330 | 0.4491 | 0.5154 |
|
71 |
-
| **CLIP ViT-L/14@336px + (BERTurk)** | 0.4548 | 0.3039 | 0.1937 | 0.1179 | 0.2056 | 0.3966 | 0.3550 |
|
72 |
-
|
73 |
-
## Demo
|
74 |
-
to run demo for images:
|
75 |
-
|
76 |
-
python demo.py --model-ckpt ./checkpoints/TRCaptionNet_L14_berturk.pth --input-dir ./images/ --device cuda:0
|
77 |
-
|
78 |
-
|
79 |
-
## TODO
|
80 |
-
|
81 |
-
- ??
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
## Citation
|
86 |
-
|
87 |
-
If you find our work helpful, please cite the following paper:
|
88 |
-
|
89 |
-
```
|
90 |
-
@ARTICLE{,
|
91 |
-
author={Serdar Yıldız and Abbas Memiş and Songül Varlı},
|
92 |
-
journal={},
|
93 |
-
title={},
|
94 |
-
year={},
|
95 |
-
volume={},
|
96 |
-
number={},
|
97 |
-
pages={},
|
98 |
-
doi={}
|
99 |
-
}
|
100 |
-
```
|
101 |
-
|
102 |
-
### Thanks to awesome works
|
103 |
-
|
104 |
-
- [BLIP](https://github.com/salesforce/BLIP)
|
105 |
-
- [ClipCap](https://github.com/rmokady/CLIP_prefix_caption)
|
|
|
1 |
+
title: TRCaptionNet
|
2 |
+
emoji: 🖼
|
3 |
+
colorFrom: red
|
4 |
+
colorTo: indigo
|
5 |
+
sdk: gradio
|
6 |
+
app_file: app.py
|
7 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|