bawolf
/

breaking-vision-clip-classifier

Video Classification

Model card Files Files and versions Community

breaking-vision-clip-classifier / README.md

bawolf's picture

Update README.md

7ff52a0 verified 13 days ago

|

3.69 kB

	---
	license: mit
	language:
	- en
	base_model:
	- openai/clip-vit-large-patch14
	pipeline_tag: video-classification
	tags:
	- dance
	- vision
	- breaking
	---
	# CLIP-Based Break Dance Move Classifier

	A deep learning model for classifying break dance moves using CLIP (Contrastive Language-Image Pre-Training) embeddings. The model is fine-tuned on break dance videos to classify different power moves including windmills, halos, swipes, and baby mills.

	## Features

	- Video-based classification using CLIP embeddings
	- Multi-frame temporal analysis
	- Configurable frame sampling and data augmentation
	- Real-time inference using Cog
	- Misclassification analysis tools
	- Hyperparameter tuning support

	## Setup

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Install Cog (if not already installed)
	curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
	chmod +x /usr/local/bin/cog
	```

	## Cog

	download the weights

	```bash
	gdown https://drive.google.com/uc?id=1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf -O ./checkpoints/
	```

	build the image

	```bash
	cog build --separate-weights
	```

	push a new image

	```bash
	cog push
	```

	## Training

	download the training data

	```bash
	gdown https://drive.google.com/uc?id=11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link -O ./data/
	```

	```bash
	# Run training with default configuration
	python scripts/train.py

	# Run hyperparameter tuning
	python scripts/hyperparameter_tuning.py
	```

	## Inference

	```bash
	# Using Cog for inference
	cog predict -i video=@path/to/your/video.mp4

	# Using standard Python script
	python scripts/inference.py --video path/to/your/video.mp4
	```

	## Analysis

	```bash
	# Generate misclassification report
	python scripts/visualization/miscalculations_report.py

	# Visualize model performance
	python scripts/visualization/visualize.py
	```

	## Project Structure

	```
	clip/
	├── src/ # Source code
	│ ├── data/ # Dataset and data processing
	│ ├── models/ # Model architecture
	│ └── utils/ # Utility functions
	├── scripts/ # Training and inference scripts
	│ └── visualization/ # Visualization tools
	├── config/ # Configuration files
	├── runs/ # Training runs and checkpoints
	├── cog.yaml # Cog configuration
	└── requirements.txt # Python dependencies
	```

	## Training Data

	To run training on your own, you can find the training data [here](https://drive.google.com/drive/folders/11M6nSuSuvoU2wpcV_-6KFqCzEMGP75q6?usp=drive_link) and put it in the a directory at the root of the project called `./data`.

	## Checkpoints

	To run predictions with cog or locally on an existing checkpoint, you can find a checkpoint and configuration files [here](https://drive.google.com/drive/folders/1Gn3UdoKffKJwz84GnGx-WMFTwZuvDsuf?usp=sharing) and put them in the a directory at the root of the project called `./checkpoints`.

	## Model Architecture

	- Base: CLIP ViT-Large/14
	- Custom temporal pooling layer
	- Fine-tuned vision encoder (last 3 layers)
	- Output: 4-class classifier

	## License

	MIT License

	Copyright (c) 2024 Bryant Wolf

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{clip-breakdance-classifier,
	author = {Bryant Wolf},
	title = {CLIP-Based Break Dance Move Classifier},
	year = {2024},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://github.com/bawolf/breaking_vision_clip_cog}}
	}
	```