File size: 5,762 Bytes
3f31c34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EfficientSAM Adapter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Quick Overview\n",
    "\n",
    "<img width=\"880\" height=\"380\" src=\"../figs/EfficientSAM/EfficientSAM.png\">\n",
    "\n",
    "The huge computation cost of [SAM](https://github.com/facebookresearch/segment-anything) model has limited its applications to wider real-world applications. To address this limitation, [EfficientSAMs](https://github.com/yformer/EfficientSAM) provide lightweight SAM models that exhibits decent performance with largely reduced complexity. This is based on leveraging SAM-leveraged masked image pertraining (SAMI), which learns to reconstruct features from SAM image encoder for effective visual representation learning.\n",
    "\n",
    "Our goal is to integrate medical specific domain knowledge into the lightweight EfficientSAM model through adaptation technique. Therefore, we only utilize the pre-trained EfficientSAM weights without reperforming the SAMI process. Like our original [Medical SAM Adapter](https://arxiv.org/abs/2304.12620), we achieve efficient migration from the original SAM to medical images by adding simple Adapters in the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "We have unified the interfaces of EfficientSAM and SAM, and training EfficientSAM Adapter can be done using the same command as the SAM adapter:\n",
    "``python train.py -net efficient_sam -data_path data/isic -sam_ckpt checkpoint/efficient_sam/efficient_sam_vits.pt -image_size 1024 -vis 100 -mod sam_adpt``\n",
    "\n",
    "The pretrained weight of EfficientSAM can be downloaded here:\n",
    "| EfficientSAM-S | EfficientSAM-Ti |\n",
    "|------------------------------|------------------------------|\n",
    "| [Download](https://github.com/yformer/EfficientSAM/blob/main/weights/efficient_sam_vits.pt.zip) |[Download](https://github.com/yformer/EfficientSAM/blob/main/weights/efficient_sam_vitt.pt)|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance VS SAM \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. The resolution of input image is 1024.\n",
    "\n",
    "#### ISIC\n",
    "| Baseline     | Backbone  | DICE   | mIou | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9225 | 0.8646 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.9091 | 0.8463 | 21275 M  |      \n",
    "| EfficientSAM | VIT-Tiny  | 0.9095 | 0.8437  |  15713 M  |\n",
    "\n",
    "#### REFUGE\n",
    "| Baseline     | Backbone  | DICE   | mIou | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9085 | 0.8423 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.8691 | 0.7915 | 21275 M  |      \n",
    "| EfficientSAM | VIT-Tiny  | 0.7999 | 0.6949 |  15713 M  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance Under different resolution \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. Using VIT-small as the backbone.\n",
    "\n",
    "#### ISIC\n",
    "| Resolution  | DICE   | mIou | Memory  |\n",
    "| --------- | ------ | ---- | ------- |\n",
    "| 1024  | 0.9091 | 0.8463 | 22427 M |\n",
    "| 512 | 0.9134 | 0.8495 | 21275 M  |      \n",
    "| 256  | 0.9080 | 0.8419  |  15713 M  |\n",
    "\n",
    "#### REFUGE\n",
    "| Resolution  | DICE   | mIou | Memory  |\n",
    "| --------- | ------ | ---- | ------- |\n",
    "| 1024  | 0.8691 | 0.7915 | 22427 M |\n",
    "| 512 | 0.8564 | 0.7673 | 21275 M  |      \n",
    "| 256  | 0.7432 | 0.6244  |  15713 M  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The decreasing curve of loss and the performance curve.\n",
    "#### Backbone: VIT-Small\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (ISIC)_loss.png\" width=\"200\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (ISIC)_performance.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (REFUGE)_loss.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (REFUGE)_performance.png\" width=\"200\" /> \n",
    "</p>\n",
    "\n",
    "#### Backbone: VIT-Tiny\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (ISIC)_loss.png\" width=\"200\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (ISIC)_performance.png\" width=\"200\" /> \n",
    "</p>\n",
    "\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (REFUGE)_loss.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (REFUGE)_performance.png\" width=\"200\" /> \n",
    "</p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.16 ('general')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "e7f99538a81e8449c1b1a4a7141984025c678b5d9c33981aa2a3c129d8e1c90d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}