gokay aydogan PRO

gokaygokay

AI & ML interests

OPEN SOURCEEEEEE!

Recent Activity

updated a model about 18 hours ago
gokaygokay/Flux-Realistic-Backgrounds-LoRA
updated a model about 18 hours ago
gokaygokay/Flux-Digital-Backgrounds-LoRA
updated a model about 18 hours ago
gokaygokay/Flux-Digital-Backgrounds-LoRA

Organizations

gokaygokay's activity

reacted to their post with ๐Ÿ”ฅ 2 months ago
view post
Post
6314
FLUX Prompt Generator Updates

- gokaygokay/FLUX-Prompt-Generator

- There are now hundreds of new selections across diverse categories, each offering a lot of choices:

Architecture, Art, Artist, Brands, Character, Cinematic, Fashion, Feelings, Geography, Human, Interaction, Keywords, Objects, People, Photography, Plots, Poses, Scene, Science, Stuff, Time, Typography, Vehicle, Video Game

- In addition to Hugging Face, I've integrated new LLM providers: Groq, OpenAI, and Claude.

- Upgraded Vision Language Models (VLMs): We now feature Qwen2-VL, JoyCaption and Florence-2-large.

- New specialized system prompts for various styles and themes, including Happy, Simple, Poster, Only Objects, No Figure, Landscape, Fantasy.
  • 1 reply
ยท
posted an update 3 months ago
view post
Post
6314
FLUX Prompt Generator Updates

- gokaygokay/FLUX-Prompt-Generator

- There are now hundreds of new selections across diverse categories, each offering a lot of choices:

Architecture, Art, Artist, Brands, Character, Cinematic, Fashion, Feelings, Geography, Human, Interaction, Keywords, Objects, People, Photography, Plots, Poses, Scene, Science, Stuff, Time, Typography, Vehicle, Video Game

- In addition to Hugging Face, I've integrated new LLM providers: Groq, OpenAI, and Claude.

- Upgraded Vision Language Models (VLMs): We now feature Qwen2-VL, JoyCaption and Florence-2-large.

- New specialized system prompts for various styles and themes, including Happy, Simple, Poster, Only Objects, No Figure, Landscape, Fantasy.
  • 1 reply
ยท
reacted to kadirnar's post with ๐Ÿ”ฅโค๏ธ๐Ÿš€ 3 months ago
view post
Post
3830
I am training a controlnet model for Flux. And some of my experiences:

Checkpoint-10000:

https://x.com/kadirnar_ai/status/1829831750471606668

Checkpoint-12000:

https://x.com/kadirnar_ai/status/1829889524962640001

Checkpoint-14000:

https://x.com/kadirnar_ai/status/1829989622878744711

Checkpoint (16000-18000):

https://x.com/kadirnar_ai/status/1830179551407665654

Dataset: kadirnar/fluxdev_controlnet_16k
GPU: 1xA100(80GB)
GPU Hours: 65
  • 1 reply
ยท
reacted to vikhyatk's post with ๐Ÿ”ฅ 3 months ago
view post
Post
4313
Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2
reacted to isidentical's post with ๐Ÿš€๐Ÿ”ฅ 3 months ago
reacted to merve's post with ๐Ÿš€๐Ÿ”ฅ 3 months ago
reacted to isidentical's post with ๐Ÿ”ฅ 3 months ago
view post
Post
1769
fal/AuraFlow-v0.3 is now here with support for different aspect resolutions (w/h up to 1536px!) and much nicer aesthetics! Make sure to install the latest diffusers to get support for it.
reacted to ucsahin's post with ๐Ÿ‘๐Ÿš€โค๏ธ๐Ÿ”ฅ 3 months ago
view post
Post
3642
๐Ÿš€ Introducing TraVisionLM: Turkish Visual Language Model - The First of Its Kind! ๐Ÿ‡น๐Ÿ‡ท๐Ÿ–ผ๏ธ

I'm thrilled to share TraVisionLM on Hugging Face! With 875M parameters, this lightweight, efficient model handles Turkish instructions for image inputs. Fully compatible with the Transformers library, itโ€™s easy to load, fine-tune, and useโ€”no external libraries needed!

Developed solo, TraVisionLM is a strong foundation for low-resource language research. While still improving, it's a key step for Turkish-language AI. Your feedback is welcome as I refine the model.

๐ŸŽ‰ Explore it now:

- Model: ucsahin/TraVisionLM-base
- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Object Detection Finetune: ucsahin/TraVisionLM-Object-Detection-ft

Letโ€™s push Turkish visual language processing forward!

---

๐Ÿš€ TraVisionLM: Tรผrรผnรผn ฤฐlk ร–rneฤŸi Tรผrkรงe Gรถrsel Dil Modelini Sunuyorum! ๐Ÿ‡น๐Ÿ‡ท๐Ÿ–ผ๏ธ

TraVisionLM modelini Hugging Face'te yayฤฑnladฤฑm! 875M parametre ile bu hafif ve verimli model, gรถrรผntรผye dayalฤฑ Tรผrkรงe talimatlarฤฑ iลŸlemek iรงin tasarlandฤฑ. Transformers kรผtรผphanesiyle tamamen uyumlu, yรผklemesi, eฤŸitmesi ve kullanmasฤฑ รงok kolayโ€”dฤฑลŸ kรผtรผphane gerekmez!

Tek baลŸฤฑma geliลŸtirdiฤŸim TraVisionLM, dรผลŸรผk kaynaklฤฑ dillerde araลŸtฤฑrmalar iรงin saฤŸlam bir temel sunuyor. GeliลŸtirmeye devam ederken geri bildirimlerinizi bekliyorum.

๐ŸŽ‰ Hemen keลŸfedin:

- Model: ucsahin/TraVisionLM-base
- Demo: https://huggingface.co/spaces/ucsahin/TraVisionLM-Turkish_Visual_Language_Model
- Obje Tespiti ฤฐnce Ayarฤฑ: ucsahin/TraVisionLM-Object-Detection-ft

Tรผrkรงe gรถrsel dil iลŸleme sฤฑnฤฑrlarฤฑnฤฑ birlikte zorlayalฤฑm!
  • 3 replies
ยท
reacted to their post with ๐Ÿ‘๐Ÿ”ฅ 3 months ago
view post
Post
7896
I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator

You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

- gokaygokay/FLUX.1-dev-with-Captioner
ยท
posted an update 3 months ago
view post
Post
7896
I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator

You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

- gokaygokay/FLUX.1-dev-with-Captioner
ยท
reacted to severo's post with ๐Ÿš€ 4 months ago
view post
Post
3437
[New tool] Follow interesting ML persons ๐Ÿ‘ฉโ€๐ŸŽจ ๐Ÿ‘จโ€๐ŸŽค ๐Ÿ‘ฉโ€๐Ÿซ with Followgraph

severo/followgraph

Please try it and tell me if it helped you discover high-quality content ๐Ÿ‘ ๐Ÿ‘Ž

I repurposed "Followgraph for Mastodon" (https://followgraph.vercel.app/).

My new follows: @TheBloke @mlabonne @teknium @KnutJaegersberg @SkalskiP @AmelieSchreiber @lbourdois @ceyda @andrewyng @Pclanglais @karpathy

And you?
ยท
reacted to merve's post with ๐Ÿ”ฅ 4 months ago
view post
Post
3313
Chameleon ๐ŸฆŽ by Meta is now available in Hugging Face transformers ๐Ÿ˜
A vision language model that comes in 7B and 34B sizes ๐Ÿคฉ
But what makes this model so special?

Demo: merve/chameleon-7b
Models: facebook/chameleon-668da9663f80d483b4c61f58

keep reading โฅฅ

Chameleon is a unique model: it attempts to scale early fusion ๐Ÿคจ
But what is early fusion?
Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM)

Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation ๐Ÿ˜

Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO)

This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use.

One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818))
Thanks for reading!
ยท