Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: timm
6
+ tags:
7
+ - mobile
8
+ - vison
9
+ - image-classification
10
+ datasets:
11
+ - imagenet-1k
12
+ metrics:
13
+ - accuracy
14
+ ---
15
+
16
+ # EfficientFormer-L3
17
+
18
+ ## Table of Contents
19
+ - [EfficientFormer-L3](#-model_id--defaultmymodelname-true)
20
+ - [Table of Contents](#table-of-contents)
21
+ - [Model Details](#model-details)
22
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
23
+ - [Uses](#uses)
24
+ - [Direct Use](#direct-use)
25
+ - [Downstream Use](#downstream-use)
26
+ - [Misuse and Out-of-scope Use](#misuse-and-out-of-scope-use)
27
+ - [Limitations and Biases](#limitations-and-biases)
28
+ - [Training](#training)
29
+ - [Training Data](#training-data)
30
+ - [Training Procedure](#training-procedure)
31
+ - [Evaluation Results](#evaluation-results)
32
+ - [Environmental Impact](#environmental-impact)
33
+ - [Citation Information](#citation-information)
34
+
35
+
36
+ <model_details>
37
+
38
+ ## Model Details
39
+
40
+ EfficientFormer-L3, developed by [Snap Research](https://github.com/snap-research), is one of three EfficientFormer models. The EfficientFormer models were released as part of an effort to prove that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.
41
+
42
+ This checkpoint of EfficientFormer-L3 was trained for 1000 epochs.
43
+
44
+ - Developed by: Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren
45
+ - Language(s): English
46
+ - License: This model is licensed under the apache-2.0 license
47
+ - Resources for more information:
48
+ - [Research Paper](https://arxiv.org/abs/2206.01191)
49
+ - [GitHub Repo](https://github.com/snap-research/EfficientFormer/)
50
+
51
+ </model_details>
52
+
53
+ <how_to_start>
54
+
55
+ ## How to Get Started with the Model
56
+
57
+ Use the code below to get started with the model.
58
+
59
+ ```python
60
+ import requests
61
+ import torch
62
+ from PIL import Image
63
+
64
+ from transformers import EfficientFormerImageProcessor, EfficientFormerForImageClassificationWithTeacher
65
+
66
+ # Load a COCO image of two cats to test the model
67
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
68
+ image = Image.open(requests.get(url, stream=True).raw)
69
+
70
+ # Load preprocessor and pretrained model
71
+ model_name = "huggingface/efficientformer-l3-300"
72
+ processor = EfficientFormerImageProcessor.from_pretrained(model_name)
73
+ model = EfficientFormerForImageClassificationWithTeacher.from_pretrained(model_name)
74
+
75
+ # Preprocess input image
76
+ inputs = processor(images=image, return_tensors="pt")
77
+
78
+ # Inference
79
+ with torch.no_grad():
80
+ outputs = model(**inputs)
81
+
82
+ # Print the top ImageNet1k class prediction
83
+ logits = outputs.logits
84
+ scores = torch.nn.functional.softmax(logits, dim=1)
85
+ top_pred_class = torch.argmax(scores, dim=1)
86
+ print(f"Predicted class: {top_pred_class}")
87
+ ```
88
+ </how_to_start>
89
+
90
+ <uses>
91
+
92
+ ## Uses
93
+
94
+ #### Direct Use
95
+
96
+ This model can be used for image classification and semantic segmentation. On mobile devices (the model was tested on iPhone 12), the CoreML checkpoints will perform these tasks with low latency.
97
+
98
+ <Limitations_and_Biases>
99
+
100
+ ## Limitations and Biases
101
+
102
+ Though most designs in EfficientFormer are general-purposed, e.g., dimension- consistent design and 4D block with CONV-BN fusion, the actual speed of EfficientFormer may vary on other platforms. For instance, if GeLU is not well supported while HardSwish is efficiently implemented on specific hardware and compiler, the operator may need to be modified accordingly. The proposed latency-driven slimming is simple and fast. However, better results may be achieved if search cost is not a concern and an enumeration-based brute search is performed.
103
+
104
+ Since the model was trained on Imagenet-1K, the [biases embedded in that dataset](https://huggingface.co/datasets/imagenet-1k#considerations-for-using-the-data) will be reflected in the EfficientFormer models.
105
+
106
+ </Limitations_and_Biases>
107
+
108
+ <Training>
109
+
110
+ ## Training
111
+
112
+ #### Training Data
113
+
114
+ This model was trained on ImageNet-1K.
115
+
116
+ See the [data card](https://huggingface.co/datasets/imagenet-1k) for additional information.
117
+
118
+ #### Training Procedure
119
+
120
+ * Parameters: 31.4 M
121
+ * Train. Epochs: 1000
122
+
123
+ Trained on a cluster with NVIDIA A100 and V100 GPUs.
124
+
125
+ </Training>
126
+
127
+ <Eval_Results>
128
+
129
+ ## Evaluation Results
130
+
131
+ Top-1 Accuracy: 82.4% on ImageNet 10K
132
+ Latency: 3.0ms
133
+
134
+ </Eval_Results>
135
+
136
+ <Cite>
137
+
138
+ ## Citation Information
139
+
140
+ ```bibtex
141
+ @article{li2022efficientformer,
142
+ title={EfficientFormer: Vision Transformers at MobileNet Speed},
143
+ author={Li, Yanyu and Yuan, Geng and Wen, Yang and Hu, Eric and Evangelidis, Georgios and Tulyakov, Sergey and Wang, Yanzhi and Ren, Jian},
144
+ journal={arXiv preprint arXiv:2206.01191},
145
+ year={2022}
146
+ }
147
+ ```
148
+ </Cite>