Civitai
/

mixtureMovieRater

Safetensors

Model card Files Files and versions Community

wolfgangblack commited on Jul 26

Commit

710a967

•

1 Parent(s): 3a955dc

Create README.md

Browse files

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+---
+license: mit
+metrics:
+- accuracy
+- f1
+---
+# Model Card for Model ID
+This repo contains models used as raters for media into categories of PG, PG13, R, X, and XXX. These models are single modality models used to create an ensemble or multimodal model. In the case of the multimodal model, the single modality models are used as processor components to create the inputs for a smaller Multilayer Perceptron (MLP)
+## Model Details
+### Model Description
+The main model here is the multimodal model trained 7/22/24. This model was trained using a weighted soft f1 loss with emphasis on class 0 (PG). This model utilizes finetuned resnet18, ViT, resnet50 with cross validation, prompt Bert, and prompt Roberta in the MultiModalProcessor. This processor passes the proper modality through the proper models and then returns the last hidden layer. These vectors are concatonated to create the input to the Multimodal Models MLP.
+Each model was trained on the same balanced downsampled dataset found [here](https://civitai.com/models/544550/training-data-for-image-classification). Please note: this dataset contains some mislabeled data across each label. The resnet50-CV is the only model which may have different training/test set data due to the cross validation search, however no data used for evaluation was found in the training/test sets. The data for evaluation is a private dataset labeled by Wolfgang Black and Seb at CivitAI.
+- **Developed by:** Wolfgang Black
+- **Model type:** Multimodal
+- **Language(s) (NLP):** English
+- **Finetuned from model [optional]:** Various - due to the multimodal nature however ony the MLP was truly trained from scratch.
+### Model Sources [optional]
+#### ResNets
+- **Link** - https://pytorch.org/vision/main/models/resnet.html
+- Note: models were initialized with `weights = 'ImageNetV1'`
+#### ViT
+- **Repository:** https://huggingface.co/google/vit-base-patch16-224
+- **Paper [optional]:** https://arxiv.org/abs/2010.11929
+#### DistilBert
+This model is the basis for promptBert
+- **Repository:** https://huggingface.co/distilbert/distilbert-base-uncased
+- **Paper:** https://arxiv.org/abs/1910.01108
+#### Roberta
+This model is the basis for promptRoberta
+- **Repository:** https://huggingface.co/FacebookAI/roberta-large-mnli
+- **Paper:** https://arxiv.org/abs/1907.11692
+## Uses
+These models should be used to classify generated images or text into movie-ratings
+## How to Get Started with the Model
+`Warning`: I did not include the code here necessary for the Multimodal Config, Processor, or Model. The code snippet below assumes the users have that code.
+```
+from src.multimodal_model import MultimodalConfig, MultimodalModel, MultimodalProcessor
+model_dir = '' #where the multimodal directory is
+config = MultimodalConfig.from_pretrained(model_dir)
+model = MultimodalModel(config).from_pretrained(model_dir) #assumes composite models exist in directories as specified by config
+processor = MultimodalProcessor(models = config.models) #assumes composite models exist in directories as specified by config
+model.eval()
+with torch.no_grad():
+    outputs = model(**inputs) ##assumes inputs as pil.Image, text = None | str(prompt), tags = None | str(tags), label = None | str
+    logits = outputs['logits']
+torch.argmax(logits, dim = 1).item()
+prediction = model.config.id2label[torch.argmax(out['logits'], dim=1).item()]
+```
+### Out-of-Scope Use
+Currently all models are untested on videos
+## Bias, Risks, and Limitations
+Models are entirely finetuned (in the case of composite models) or trained (MLP) on generated images and may not work well on real images or non-digital media
+### Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. This includes the poor labels for PG13/R due to personal bias of the dataset as well as that all data for training is generated images