File size: 4,539 Bytes
3f31c34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Lora for Adaption"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Quick Overview\n",
    "\n",
    "#### LoRa(ICLR2021)\n",
    "<img width=\"380\" height=\"380\" src=\"../figs/lora/lora.png\">\n",
    "\n",
    "Low-Rank Adaptation, or [LoRA](https://github.com/microsoft/LoRA), freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times.\n",
    "\n",
    "#### AdaLoRa(ICLR2023)\n",
    "[AdaLoRA](https://github.com/QingruZhang/AdaLoRA) adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows for the effective pruning of the singular values of unimportant updates, which is essential to reduce the parameter budget but circumvent intensive exact SVD computations.\n",
    "\n",
    "### Application in our framework\n",
    "For each AttentionBlock in ImageEncoder, we replace the two Linear Layers in Attention and Mlp with LoRa Linear Layer and AdaLoRa Layer, i.e., one example [here](../models/ImageEncoder/vit/lora_block.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "In SAM, EfficientSAM, and MobileSAM, the adjustment of models using Lora is supported. The -mod option can be used to specify the fine-tuning method:\n",
    "``python train.py -net mobile_sam -dataset REFUGE -data_path data/REFUGE -sam_ckpt checkpoint/mobile_sam/mobile_sam.pt -image_size 256 -vis 100 -exp_name tiny-mobile-isic-256 -encoder tiny_vit -mod sam_lora``\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance VS Adapter\n",
    "#### REFUGE\n",
    "| Baseline     | Backbone  | mode | DICE   | mIou | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- | ------------ |\n",
    "| EfficientSAM | VIT-Small | Adapter | 0.8691 | 0.7915 | 21275 M  |\n",
    "| EfficientSAM | VIT-Small | Lora | 0.8573 | 0.7703 | 22777 M |\n",
    "| EfficientSAM | VIT-Small | AdaLora | 0.8558 | 0.7596 | 22779 M |\n",
    "| MobileSAM | Tiny-Vit | Adapter | 0.9330 | 0.8812 | 10255M |\n",
    "| MobileSAM | Tiny-Vit | Lora | 0.9107 | 0.8436 | 10401M |\n",
    "| MobileSAM | Tiny-Vit | AdaLora | 0.8863 | 0.8031 | 10401M |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Curve of loss and performance\n",
    "**based on MobileSAM(Tiny-Vit) model and REFUGE dataset**\n",
    "\n",
    "#### Adapter\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (Adapter)_loss.png\" width=\"400\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (Adapter)_performance.png\" width=\"400\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "\n",
    "#### LoRa\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (Lora)_loss.png\" width=\"400\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (Lora)_performance.png\" width=\"400\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "\n",
    "#### AdaLoRa\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (AdaLora)_loss.png\" width=\"400\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/lora/MobileSAM-Ti (AdaLora)_performance.png\" width=\"400\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "\n",
    "It can be seen that the training method using Adapter is more stable."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.16 ('general')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "e7f99538a81e8449c1b1a4a7141984025c678b5d9c33981aa2a3c129d8e1c90d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}