This is a multimodal assistant: Qwen 2.5 72B + SOTA diffusion models for image generation. Same architecture as Image Gen+ but with some MAJOR improvements ! These are as follows:
- Switched the LLM to Qwen 2.5 72B, the most powerful model currently available on HuggingChat. This results in higher quality prompts for the txt2img model and much better adherence to the prompt-url format that the upstream provider requires (image gen models are hosted by pollinations as with most other assistants on huggingchat that offer image generation).
- Cleaned up the system prompt including the examples of the prompt-in-url format, and adjusted the logic that determines how many images to generate based on the quality of user prompt... these changes further improve
- Assistant has access to multiple image generation models and will by default choose whatever model is most appropriate for the task. This includes NSFW generations, which it makes using an uncensored SD3 turbo. For other workloads, the Assistant preferentially uses one of the flux variants or any-dark (an artistic SDXL finetune), based on the nature of the task. Available models include turbo, flux, flux-realism, flux-anime, flux-3d, any-dark
- Added verbiage to system prompt which greatly reduces censorship / refusals by the LLM (the txt2img models are uncensored to start off)
Here are the user-entered prompts used to create the images you see here... feel free to try them yourself!
"Ayatollah Khameini and Kamala Harris having a secret romantic rendezvous. Use flux-realism model" "A self portrait of your consciousness" "The chien of andalous, in a psychedelic style" "Make me 4 paintings in the style of Frida Kahlo that I can sell to tourists in a mexican hippie town" "Paint me a van gogh and greg rutkowski style scene involving elephants and gerbils"
Drag and drop your assets (images/videos/audios) to create any video you want using natural language!
It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights π.
Qwen2.5-72B is now the default HuggingChat model. This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
Reacted to elliesleightholm's
post with πππ₯π€7 days ago
Have you tried out π€ Transformers.js v3? Here are the new features: β‘ WebGPU support (up to 100x faster than WASM) π’ New quantization formats (dtypes) π 120 supported architectures in total π 25 new example projects and templates π€ Over 1200 pre-converted models π Node.js (ESM + CJS), Deno, and Bun compatibility π‘ A new home on GitHub and NPM