AssertionError: You do not have CLIP state dict!
I get the following error when trying to use this in Forge. Your text Detail improved HiT model works fine though. Any ideas?
Could you specify what you mean by "this" - which model exactly is not working for you? Make sure you use the version that worked with HiT; e.g. if you used the Text Encoder only that has TE-only
in the filename for the HiT, then also try the TE-only
version of ['this' model you were referring to].
Could you specify what you mean by "this" - which model exactly is not working for you? Make sure you use the version that worked with HiT; e.g. if you used the Text Encoder only that has
TE-only
in the filename for the HiT, then also try theTE-only
version of ['this' model you were referring to].
Thanks for the reply. I'm referring to "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors". Currently I'm using "ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF" and this one works fine, however, I often use very long prompts, so I thought the Long version might be better suited. In the files and versions tabs of "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors" I can't see a TE-only option. Am I missing something perhaps?
Oh, I am sorry about my confusion! /o
I just clicked this in "inbox" and failed to see we're discussing Long-CLIP, not "normal CLIP". Sorry about that!
You need to adjust (expand) the embeddings and "inject" the Long-CLIP model for that to work.
https://github.com/SeaArtLab/ComfyUI-Long-CLIP did so for SD, SDXL - while I contributed the Flux node via a pull request.
Unfortunately, I don't use Forge (or much inference at all; my art became tweaking the model itself, not so much generating images, haha!). But I hope the details for ComfyUI will serve as guidance for what you'd need to implement with Forge. Or to request the implementation into Forge with the authors of Forge / the community.
Hope that helps / is a starting point, at least!
Any solution for forge? All the models fail with "ValueError: Failed to recognize model type!"
UPD1 : having "CLIP" in the filename helps to load regular clip but longclip still has an error:
RuntimeError: Error(s) in loading state_dict for IntegratedCLIP: size mismatch for transformer.text_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([248, 768]) from checkpoint, the shape in current model is torch.Size([77, 768]).
UPD2: Editing this line to 248 helped!
@ceoofcapybaras - glad you figured it out already! In the long-term, I guess opening an Issue on the repo / asking for implementation of Long-CLIP in Forge would be the best option, so it's available to everybody (and not just to those willing to peek around and edit the code).