Model not working
I tried to use the hosted inference API for Align model, but it is not working. Error message received: The model_type 'align' is not recognized. It could be a bleeding edge model, or incorrect.
I have tried to import AlignModel and AlignProcessor from the transformers library, and i get Import Error there as well. There seems to be some error in the model. Any updates/help will be highly appreciated!
The transformers library with the ALIGN model is not yet released and is only on the main branch,
so to use the current ALIGN model, do pip install git+https://github.com/huggingface/transformers
command to install and use transformers.
from transformers import AlignProcessor, AlignModel
processor = AlignProcessor.from_pretrained("kakaobrain/align-base")
model = AlignModel.from_pretrained("kakaobrain/align-base")
"""
Downloading (β¦)rocessor_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββ| 508/508 [00:00<00:00, 59.5kB/s]
Downloading (β¦)okenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββ| 399/399 [00:00<00:00, 53.4kB/s]
Downloading (β¦)solve/main/vocab.txt: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 232k/232k [00:00<00:00, 279kB/s]
Downloading (β¦)cial_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββ| 125/125 [00:00<00:00, 49.1kB/s]
Downloading (β¦)lve/main/config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββ| 5.25k/5.25k [00:00<00:00, 660kB/s]
Downloading pytorch_model.bin: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 690M/690M [00:31<00:00, 22.1MB/s]
"""
I think something is wrong with the weight initialization of the ALIGN model class. It shows me a warning to train all the layers since they are not initialized!
How to fine-tune this model?
@Sersh You can fine-tune this model in the same way you fine-tune other PyTorch models. Here's one way you can do it:
import torch
import torch.nn as nn
from transformers import AlignModel
class AlignClassifier(nn.module):
def __init__(self, num_classes):
super(AlignClassifier, self).__init__()
self.model = AlignModel.from_pretrained("kakaobrain/align-base")
# embedding size of Align Model is 640 for each modality.
hidden_size = 640 + 640
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, **out):
outputs = self.model(**out)
image_embeds = outputs.image_embeds
text_embeds = outputs.text_embeds
# concatenate both embeddings
embeds = torch.cat((image_embeds, text_embeds), dim=1)
outputs = self.fc(embeds)
return outputs
@rabiulawal
If you use the from_pretrained
method correctly, I think the model should have trained weights. Make sure you are not initializing using config
as that will essentially give you only the architecture as per your configurations and the weights are randomly initialized.
model = AlignVisionModel.from_pretrained('/opt/licy/vms/align')
I used this code to load the model Why does it show a lot of missing parameters?
Some weights of AlignVisionModel were not initialized from the model checkpoint at /opt/licy/vms/align and are newly initialized: ['encoder.blocks.24.expansion.expand_bn.running_var', 'encoder.blocks.33.projection.project_bn.running_mean', 'encoder.blocks.12.depthwise_conv.depthwise_norm.num_batches_tracked', 'encoder.blocks.36.projection.project_bn.bias', 'encoder.blocks.40.squeeze_excite.expand.weight', 'encoder.blocks.23.depthwise_conv.depthwise_conv.weight', 'encoder.blocks.38.expansion.expand_bn.bias', 'encoder.blocks.49.squeeze_excite.reduce.bias', 'encoder.blocks.6.expansion.expand_bn.bias', 'encoder.blocks.40.expansion.expand_bn.bias', 'encoder.blocks.44.depthwise_conv.depthwise_norm.running_mean', 'encoder.blocks.44.depthwise_conv.depthwise_norm.weight', 'encoder.blocks.2.projection.project_bn.running_mean', 'encoder.blocks.43.projection.project_bn.running_var', 'encoder.blocks.53.expansion.expand_bn.running_var', 'encoder.blocks.54.depthwise_conv.depthwise_norm.bias', 'encoder.blocks.11.squeeze_excite.reduce.bias', 'encoder.blocks.35.depthwise_conv.depthwise_norm.num_batches_tracked', 'encoder.blocks.49.depthwise_conv.depthwise_conv.weight', 'encoder.blocks.49.depthwise_conv.depthwise_norm.running_var', 'encoder.blocks.53.expansion.expand_bn.weight', 'encoder.blocks.27.projection.project_bn.weight', 'encoder.blocks.6.depthwise_conv.depthwise_norm.num_batches_tracked', 'encoder.blocks.40.depthwise_conv.depthwise_norm.weight', 'encoder.blocks.18.projection.project_bn.running_var', 'encoder.blocks.9.expansion.expand_bn.running_var', 'encoder.blocks.32.squeeze_excite.expand.weight', 'encoder.blocks.40.squeeze_excite.reduce.weight', 'encoder.blocks.42.projection.project_bn.bias', 'encoder.blocks.52.projection.project_conv.weight', 'encoder.blocks.3.depthwise_conv.depthwise_norm.bias', 'encoder.blocks.0.depthwise_conv.depthwise_norm.running_var', 'encoder.blocks.27.projection.project_bn.num_batches_tracked', 'encoder.blocks.35.depthwise_conv.depthwise_norm.weight', 'encoder.blocks.15.expansion.expand_bn.bias', 'encoder.blocks.44.expansion.expand_bn.num_batches_tracked', 'encoder.blocks.3.depthwise_conv.depthwise_norm.weight', 'encoder.blocks.7.expansion.expand_bn.running_var', 'encoder.blocks.10.projection.project_bn.bias', 'encoder.blocks.52.depthwise_conv.depthwise_norm.running_mean', 'encoder.blocks.17.expansion.expand_conv.weight', ...........................