initial commit

Browse files

Files changed (4) hide show

README.md +55 -1
config.json +39 -0
preprocessor_config.json +18 -0
pytorch_model.bin +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
 ---
-license: mit
 ---

 ---
+tags:
+- object-detection
+- vision
+finetuned_from:
+- hustvl/yolos-small
 ---
+# YOLOS (small-sized) model fine-tuned on Matterport balloon dataset
+YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN). YOLOS model fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Fang et al. and first released in [this repository](https://github.com/hustvl/YOLOS).
+## Model description
+The model is trained using a "bipartite matching loss": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.
+Currently, both the feature extractor and model support PyTorch.
+## Training data
+This model was pre-trained on [ImageNet-1k](https://huggingface.co/datasets/imagenet2012) and fine-tuned on [COCO 2017 object detection](https://cocodataset.org/#download), a dataset consisting of 118k/5k annotated images for training/validation respectively. It was further fine-tuned on [Matterport Balloon Detection dataset](https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip), a dataset containg 74 annotated images.
+### Training
+The model was pre-trained for 200 epochs on ImageNet-1k, fine-tuned for 150 epochs on COCO and further fine-tuned for 96 epochs on Matterport Balloon Dataset.
+You can go through its detailed notebook [here](https://github.com/ZohebAbai/Deep-Learning-Projects/blob/master/10_PT_Object_Detection_using_Transformers.ipynb).
+## Evaluation results
+This model achieves an AP (average precision) of **26.9** on Matterport Balloon validation.
+### BibTeX entry and citation info
+```bibtex
+@article{DBLP:journals/corr/abs-2106-00666,
+  author    = {Yuxin Fang and
+               Bencheng Liao and
+               Xinggang Wang and
+               Jiemin Fang and
+               Jiyang Qi and
+               Rui Wu and
+               Jianwei Niu and
+               Wenyu Liu},
+  title     = {You Only Look at One Sequence: Rethinking Transformer in Vision through
+               Object Detection},
+  journal   = {CoRR},
+  volume    = {abs/2106.00666},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2106.00666},
+  eprinttype = {arXiv},
+  eprint    = {2106.00666},
+  timestamp = {Fri, 29 Apr 2022 19:49:16 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2106-00666.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "architectures": [
+    "YolosForObjectDetection"
+  ],
+  "attention_probs_dropout_prob": 0.0,
+  "auxiliary_loss": false,
+  "bbox_cost": 5,
+  "bbox_loss_coefficient": 5,
+  "class_cost": 1,
+  "eos_coefficient": 0.1,
+  "giou_cost": 2,
+  "giou_loss_coefficient": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.0,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "Balloon"
+  },
+  "image_size": [
+    512,
+    864
+  ],
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "Balloon": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "model_type": "yolos",
+  "num_attention_heads": 6,
+  "num_channels": 3,
+  "num_detection_tokens": 100,
+  "num_hidden_layers": 12,
+  "patch_size": 16,
+  "qkv_bias": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.22.2",
+  "use_mid_position_embeddings": true
+}

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "YolosFeatureExtractor",
+  "format": "coco_detection",
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "max_size": 1333,
+  "size": 800
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:073d1210ad9ce9aa2d7fed6a9fd85eb87522e258b876e23cd7a6e9edd3a3d068
+size 122667609