|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- openai/clip-vit-large-patch14 |
|
pipeline_tag: video-classification |
|
tags: |
|
- dance |
|
- vision |
|
- breaking |
|
--- |
|
# CLIP-Based Break Dance Move Classifier |
|
|
|
A deep learning model for classifying break dance moves using CLIP (Contrastive Language-Image Pre-Training) embeddings. The model is fine-tuned on break dance videos to classify different power moves including windmills, halos, swipes, and baby mills. |
|
|
|
## Features |
|
|
|
- Video-based classification using CLIP embeddings |
|
- Multi-frame temporal analysis |
|
- Configurable frame sampling and data augmentation |
|
- Real-time inference using Cog |
|
- Misclassification analysis tools |
|
- Hyperparameter tuning support |
|
|
|
## Setup |
|
|
|
```bash |
|
# Install dependencies |
|
pip install -r requirements.txt |
|
|
|
# Install Cog (if not already installed) |
|
curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` |
|
chmod +x /usr/local/bin/cog |
|
``` |
|
|
|
## Cog |
|
|
|
download the weights |
|
|
|
```bash |
|
gdown https://drive.google.com/uc?id=1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf -O ./checkpoints/ |
|
``` |
|
|
|
build the image |
|
|
|
```bash |
|
cog build --separate-weights |
|
``` |
|
|
|
push a new image |
|
|
|
```bash |
|
cog push |
|
``` |
|
|
|
## Training |
|
|
|
download the training data |
|
|
|
```bash |
|
gdown https://drive.google.com/uc?id=11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link -O ./data/ |
|
``` |
|
|
|
```bash |
|
# Run training with default configuration |
|
python scripts/train.py |
|
|
|
# Run hyperparameter tuning |
|
python scripts/hyperparameter_tuning.py |
|
``` |
|
|
|
## Inference |
|
|
|
```bash |
|
# Using Cog for inference |
|
cog predict -i video=@path/to/your/video.mp4 |
|
|
|
# Using standard Python script |
|
python scripts/inference.py --video path/to/your/video.mp4 |
|
``` |
|
|
|
## Analysis |
|
|
|
```bash |
|
# Generate misclassification report |
|
python scripts/visualization/miscalculations_report.py |
|
|
|
# Visualize model performance |
|
python scripts/visualization/visualize.py |
|
``` |
|
|
|
## Project Structure |
|
|
|
``` |
|
clip/ |
|
βββ src/ # Source code |
|
β βββ data/ # Dataset and data processing |
|
β βββ models/ # Model architecture |
|
β βββ utils/ # Utility functions |
|
βββ scripts/ # Training and inference scripts |
|
β βββ visualization/ # Visualization tools |
|
βββ config/ # Configuration files |
|
βββ runs/ # Training runs and checkpoints |
|
βββ cog.yaml # Cog configuration |
|
βββ requirements.txt # Python dependencies |
|
``` |
|
|
|
## Training Data |
|
|
|
To run training on your own, you can find the training data [here](https://drive.google.com/drive/folders/11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link) and put it in the a directory at the root of the project called `./data`. |
|
|
|
## Checkpoints |
|
|
|
To run predictions with cog or locally on an existing checkpoint, you can find a checkpoint and configuration files [here](https://drive.google.com/drive/folders/1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf?usp=sharing) and put them in the a directory at the root of the project called `./checkpoints`. |
|
|
|
## Model Architecture |
|
|
|
- Base: CLIP ViT-Large/14 |
|
- Custom temporal pooling layer |
|
- Fine-tuned vision encoder (last 3 layers) |
|
- Output: 4-class classifier |
|
|
|
## License |
|
|
|
MIT License |
|
|
|
Copyright (c) 2024 Bryant Wolf |
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
```bibtex |
|
@misc{clip-breakdance-classifier, |
|
author = {Bryant Wolf}, |
|
title = {CLIP-Based Break Dance Move Classifier}, |
|
year = {2024}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Model Hub}, |
|
howpublished = {\url{https://github.com/bawolf/breaking_vision_clip_cog}} |
|
} |
|
``` |