RomanShnurov commited on
Commit
22baf80
1 Parent(s): 291f35b
Files changed (11) hide show
  1. README.md +135 -0
  2. __init__.py +0 -0
  3. config.json +12 -0
  4. images/1.jpg +0 -0
  5. images/2.jpg +0 -0
  6. images/3.jpg +0 -0
  7. images/4.webp +0 -0
  8. midjourney200M.pt +3 -0
  9. midjourney200M.py +19 -0
  10. pipeline.py +51 -0
  11. requirements.txt +3 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: generic
3
+ license: cc-by-sa-3.0
4
+ pipeline_tag: image-classification
5
+ tags:
6
+ - ai_or_not
7
+ - sumsub
8
+ - image_classification
9
+ - sumsubaiornot
10
+ - aiornot
11
+ - deepfake
12
+ - synthetic
13
+ - generated
14
+ - pytorch
15
+ metrics:
16
+ - accuracy
17
+ ---
18
+
19
+ # For Fake's Sake: a set of models for detecting generated and synthetic images
20
+
21
+ Many people on the internet have recently been tricked by fake images of Pope Francis wearing a coat or of Donald Trump's arrest.
22
+ To help combat this issue, we provide detectors for such images generated by popular tools like Midjourney and Stable Diffusion.
23
+
24
+ | ![Image1](images/3.jpg) | ![Image2](images/2.jpg) | ![Image3](images/4.webp) |
25
+ |-------------------------|-------------------------|--------------------------|
26
+
27
+ ## Model Details
28
+
29
+ ### Model Description
30
+
31
+ - **Developed by:** [Sumsub AI team](https://sumsub.com/)
32
+ - **Model type:** Image classification
33
+ - **License:** CC-By-SA-3.0
34
+ - **Types:** *midjourney_200m*(Size: 200M parameters, Description: Designed to detect photos created using various versions of Midjourney)
35
+ - **Finetuned from model:** *convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384*
36
+
37
+ ## Demo
38
+
39
+ The demo page can be found [here](https://huggingface.co/spaces/Sumsub/Sumsub-ffs-demo).
40
+
41
+ ## How to Get Started with the Model & Model Sources
42
+
43
+ Use the code below to get started with the model:
44
+
45
+ ```bash
46
+ git lfs install
47
+ git clone https://huggingface.co/Sumsub/Sumsub-ffs-synthetic-1.0_mj_200 sumsub_synthetic_mj_200
48
+ ```
49
+
50
+ ```python
51
+ from sumsub_synthetic_mj_200.pipeline import PreTrainedPipeline
52
+ from PIL import Image
53
+
54
+ pipe = PreTrainedPipeline("sumsub_synthetic_mj_200/")
55
+
56
+ img = Image.open("sumsub_synthetic_mj_200/images/2.jpg")
57
+
58
+ result = pipe(img)
59
+ print(result)
60
+ ```
61
+
62
+ You may need these prerequsites installed:
63
+
64
+ ```bash
65
+ pip install -r requirements.txt
66
+ pip install "git+https://github.com/rwightman/pytorch-image-models"
67
+ pip install "git+https://github.com/huggingface/huggingface_hub"
68
+ ```
69
+
70
+ ## Training Details
71
+
72
+ ### Training Data
73
+
74
+ The models were trained on the following datasets:
75
+
76
+ **Midjourney datasets:**
77
+
78
+ - *Real photos* : [MS COCO](https://cocodataset.org/#home).
79
+ - *AI photos* : a curated dataset of images from Pinterest boards dedicated to Generative AI ([Midjourney](href='https://pin.it/13UkjgM),[Midjourney AI Art](https://pin.it/6pNXlz3), [Midjourney - Community Showcase](https://pin.it/7gi4jmT), [Midjourney](https://pin.it/4FW0LXQ), [MIDJOURNEY](https://pin.it/5mSsiPg), [Midjourney](https://pin.it/2Qx92QW)).
80
+
81
+ ### Training Procedure
82
+
83
+ To improve the performance metrics, we used data augmentations such as rotation, crop, Mixup and CutMix. Each model was trained for 30 epochs using early stopping with batch size equal to 32.
84
+
85
+ ## Evaluation
86
+
87
+ For evaluation we used the following datasets:
88
+
89
+ **Midjourney datasets:**
90
+
91
+ - [Kaggle Midjourney 2022-250k](https://www.kaggle.com/datasets/ldmtwo/midjourney-250k-csv): set of 250k images generated by Midjourney.
92
+ - [Kaggle Midjourney v5.1](https://www.kaggle.com/datasets/iraklip/modjourney-v51-cleaned-data): set of 400k images generated by Midjourney version 5.1.
93
+
94
+ **Realistic images:**
95
+
96
+ - [MS COCO](https://cocodataset.org/#home): set of 120k real world images.
97
+
98
+ ## Metrics
99
+
100
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
101
+
102
+ | Model | Dataset | Accuracy |
103
+ |-----------------|---------------------------------------------------------------------------------------------------------------|----------|
104
+ | midjourney_200M | [Kaggle Midjourney 2022-250k](https://www.kaggle.com/datasets/ldmtwo/midjourney-250k-csv) | 0.876 |
105
+ | midjourney_200M | [Kaggle Midjourney v5.1](https://www.kaggle.com/datasets/iraklip/modjourney-v51-cleaned-data) | 0.893 |
106
+ | midjourney_200M | [MS COCO](https://cocodataset.org/#home) | 0.939 |
107
+
108
+ ## Limitations
109
+
110
+ - It should be noted that achieving 100% accuracy is not possible. Therefore, the model output should only be used as an indication that an image may have been (but not definitely) artificially generated.
111
+ - Our models may face challenges in accurately predicting the class for real-world examples that are extremely vibrant and of exceptionally high quality. In such cases, the richness of colors and fine details may lead to misclassifications due to the complexity of the input. This could potentially cause the model to focus on visual aspects that are not necessarily indicative of the true class.
112
+
113
+ ![Image1](images/1.jpg)
114
+
115
+ ## Citation
116
+
117
+ If you find this useful, please cite as:
118
+
119
+ ```text
120
+ @misc{sumsubaiornot,
121
+ publisher = {Sumsub},
122
+ url = {https://huggingface.co/Sumsub/Sumsub-ffs-synthetic-1.0_mj_200},
123
+ year = {2023},
124
+ author = {Savelyev, Alexander and Toropov, Alexey and Goldman-Kalaydin, Pavel and Samarin, Alexey},
125
+ title = {For Fake's Sake: a set of models for detecting deepfakes, generated images and synthetic images}
126
+ }
127
+ ```
128
+
129
+ ## References
130
+
131
+ - Stöckl, Andreas. (2022). Evaluating a Synthetic Image Dataset Generated with Stable Diffusion. 10.48550/arXiv.2211.01777.
132
+ - Lin, Tsung-Yi & Maire, Michael & Belongie, Serge & Hays, James & Perona, Pietro & Ramanan, Deva & Dollár, Piotr & Zitnick, C.. (2014). Microsoft COCO: Common Objects in Context.
133
+ - Howard, Andrew & Zhu, Menglong & Chen, Bo & Kalenichenko, Dmitry & Wang, Weijun & Weyand, Tobias & Andreetto, Marco & Adam, Hartwig. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
134
+ - Liu, Zhuang & Mao, Hanzi & Wu, Chao-Yuan & Feichtenhofer, Christoph & Darrell, Trevor & Xie, Saining. (2022). A ConvNet for the 2020s.
135
+ - Wang, Zijie & Montoya, Evan & Munechika, David & Yang, Haoyang & Hoover, Benjamin & Chau, Polo. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. 10.48550/arXiv.2210.14896.
__init__.py ADDED
File without changes
config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "id2label": {
3
+ "0": "by AI",
4
+ "1": "by human"
5
+ },
6
+ "label2id": {
7
+ "by AI": "0",
8
+ "by human": "1"
9
+ },
10
+ "pretrained": false,
11
+ "timm_model": "convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384"
12
+ }
images/1.jpg ADDED
images/2.jpg ADDED
images/3.jpg ADDED
images/4.webp ADDED
midjourney200M.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a689a102e0f54967d2e670a0652bd1d440bc7874ae7cdce77e2a1ccc7ad3f0f
3
+ size 794891785
midjourney200M.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import timm
2
+ import torch
3
+ from torch import nn
4
+
5
+ class Model200M(torch.nn.Module):
6
+ def __init__(self):
7
+ super().__init__()
8
+ self.model = timm.create_model('convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384',
9
+ pretrained=False,
10
+ num_classes=0)
11
+
12
+ self.clf = nn.Sequential(
13
+ nn.Linear(1536, 128),
14
+ nn.ReLU(inplace=True),
15
+ nn.Linear(128, 2))
16
+
17
+ def forward(self, image):
18
+ image_features = self.model(image)
19
+ return self.clf(image_features)
pipeline.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Any
2
+ from PIL import Image
3
+
4
+ import os
5
+ import json
6
+ import torch
7
+ import torchvision
8
+ from torch.nn import functional as F
9
+
10
+ from midjourney200M import Model200M
11
+
12
+ class PreTrainedPipeline():
13
+ def __init__(self, path=""):
14
+ self.model = Model200M()
15
+ ckpt = torch.load(os.path.join(path, "midjourney200M.pt"), map_location=torch.device('cpu'))
16
+ self.model.load_state_dict(ckpt)
17
+ self.model.eval()
18
+
19
+ with open(os.path.join(path, "config.json")) as config:
20
+ config = json.load(config)
21
+ self.id2label = config["id2label"]
22
+
23
+ self.tfm = torchvision.transforms.Compose([
24
+ torchvision.transforms.Resize((640, 640)),
25
+ torchvision.transforms.ToTensor(),
26
+ torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],
27
+ std=[0.229, 0.224, 0.225]),
28
+ ])
29
+
30
+ def __call__(self, inputs: "Image.Image") -> List[Dict[str, Any]]:
31
+ """
32
+ Args:
33
+ inputs (:obj:`PIL.Image`):
34
+ The raw image representation as PIL.
35
+ No transformation made whatsoever from the input. Make all necessary transformations here.
36
+ Return:
37
+ A :obj:`list`:. The list contains items that are dicts should be liked {"label": "XXX", "score": 0.82}
38
+ It is preferred if the returned list is in decreasing `score` order
39
+ """
40
+ img = self.tfm(inputs)
41
+ return self.predict_from_model(img)
42
+
43
+ def predict_from_model(self, img):
44
+ y = self.model.forward(img[None, ...])
45
+ y_1 = F.softmax(y, dim=1)[:, 1].cpu().detach().numpy()
46
+ y_2 = F.softmax(y, dim=1)[:, 0].cpu().detach().numpy()
47
+ labels = [
48
+ {"label": str(self.id2label["0"]), "score": y_1.tolist()[0]},
49
+ {"label": str(self.id2label["1"]), "score": y_2.tolist()[0]},
50
+ ]
51
+ return labels
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ timm==0.9.5
2
+ torch==2.0.1
3
+ torchvision==0.15.2