radames (Radamés Ajna)

Reacted to ucsahin's post with 🔥🚀 11 days ago

Post

3591

Florence-2 has a great capability of detecting various objects in a zero-shot setting with the task prompt "<OD>". However, if you want to detect specific objects that the base model is not able to in its current form, you can easily finetune it for this particular task. Below I show how to finetune the model to detect tables in a given image, but a similar process can be applied to detect any objects. Thanks to @andito , @merve , and @SkalskiP for sharing the fix for finetuning the Florence-2 model. Please also check their great blog post at https://huggingface.co/blog/finetune-florence2.

Colab notebook: https://colab.research.google.com/drive/1Y8GVjwzBIgfmfD3ZypDX5H1JA_VG0YDL?usp=sharing
Finetuned model: ucsahin/Florence-2-large-TableDetection

5 replies

·

Reacted to prithivMLmods's post with ❤️ 18 days ago

Post

5602

New Style, New Mix, New Drop 🧤

🧨Flux LoRA DLC: prithivMLmods/FLUX-LoRA-DLC

🎆Glowing-Body: prithivMLmods/Glowing-Body-Flux-LoRA
🎆Electric-Blue: prithivMLmods/Electric-Blue-Flux-LoRA
🎆Intense-Red: prithivMLmods/Intense-Red-Flux-LoRA
🎆Clouds-Illusion: prithivMLmods/Clouds-Illusion-Flux-LoRA
🎆Digital-Yellow: prithivMLmods/Digital-Yellow-Flux-LoRA

🧨Flux LoRA Collection: prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

.
.
.
@prithivMLmods

Reacted to gokaygokay's post with 🔥 4 months ago

Post

7994

I've built a space for creating prompts for FLUX

gokaygokay/FLUX-Prompt-Generator

You can create long prompts from images or simple words. Enhance your short prompts with prompt enhancer. You can configure various settings such as artform, photo type, character details, scene details, style, and artist to create tailored prompts.

And you can combine all of them with custom prompts using llms (Mixtral, Mistral, Llama 3, and Mistral-Nemo).

The UI is a bit complex, but it includes almost everything you need. Choosing random option is the most fun!

And i've created some other spaces for using FLUX models with captioners and enhancers.

- gokaygokay/FLUX.1-dev-with-Captioner

4 replies

·

Reacted to sayakpaul's post with 🔥 4 months ago

Post

4460

Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday 🤗

4 replies

·

Reacted to sayakpaul's post with ❤️ 5 months ago

Post

3115

What is your favorite part of our Diffusers integration of Stable Diffusion 3?

My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.

Learn more about them here:
https://huggingface.co/blog/sd3

Reacted to sayakpaul's post with 🔥 5 months ago

Post

2191

Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? 🧨

Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.

Check out the guide here 🦯
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts

1 reply

·

Reacted to merve's post with 🤯👀 5 months ago

Post

3562

EPFL and Apple (at @EPFL-VILAB ) just released 4M-21: single any-to-any model that can do anything from text-to-image generation to generating depth masks! 🙀
4M is a multimodal training framework introduced by Apple and EPFL.
Resulting model takes image and text and output image and text 🤩

Models: EPFL-VILAB/4m-models-660193abe3faf4b4d98a2742
Demo: EPFL-VILAB/4M
Paper: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities (2406.09406)

This model consists of transformer encoder and decoder, where the key to multimodality lies in input and output data:

input and output tokens are decoded to generate bounding boxes, generated image's pixels, captions and more!

This model also learnt to generate canny maps, SAM edges and other things for steerable text-to-image generation 🖼️

The authors only added image-to-all capabilities for the demo, but you can try to use this model for text-to-image generation as well ☺️

Reacted to merve's post with ❤️🔥 5 months ago

Post

4208

I love Depth Anything V2 😍
It’s Depth Anything, but scaled with both larger teacher model and a gigantic dataset!

Here's a small TLDR of paper with a lot of findings, experiments and more.
I have also created a collection that has the models, the dataset, the demo and CoreML converted model 😚 merve/depth-anything-v2-release-6671902e798cd404513ffbf5

The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out what’s up with using synthetic images vs real images for MDE:

🔖 Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked

🔖 Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but there’s a distribution shift between real and synthetic images, and they have restricted scene coverage

The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model can’t generalize well (but large models generalize inherently anyway) 🧐
But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) 🥲

Depth Anything v2 framework is to..

🦖 Train a teacher model based on DINOv2-G based on 595K synthetic images
🏷️ Label 62M real images using teacher model
🦕 Train a student model using the real images labelled by teacher
Result: 10x faster and more accurate than Marigold!

The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!

Reacted to m-ric's post with 👍 5 months ago

Post

3126

💰 𝗚𝗲𝘁 𝘁𝗵𝗲 𝗽𝗿𝗶𝗰𝗲 𝗼𝗳 𝗮𝗻𝘆 𝗟𝗟𝗠 𝗔𝗣𝗜 𝗿𝗲𝗾𝘂𝗲𝘀𝘁 ⇒ 𝘁𝗼𝗸𝗲𝗻𝗰𝗼𝘀𝘁

I've just found out about 𝙰𝚐𝚎𝚗𝚝𝙾𝚙𝚜-𝙰𝙸/𝚝𝚘𝚔𝚎𝚗𝚌𝚘𝚜𝚝 (https://github.com/AgentOps-AI/tokencost).
𝗧𝗵𝗶𝘀 𝗹𝗶𝗯𝗿𝗮𝗿𝘆 𝗴𝗶𝘃𝗲𝘀 𝘆𝗼𝘂 𝘁𝗵𝗲 𝗽𝗿𝗶𝗰𝗲 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗰𝗮𝗹𝗹𝘀 𝘁𝗼 𝗮𝗻𝘆 𝗟𝗟𝗠 𝗔𝗣𝗜: OpenAI, Anthropic, Mistral, AWS or Databricks...

For any model, you can use as input either string prompts or messages, and get as outputs either the price or token count.

Congrats to the AgentOps-AI team: this will be very useful when trying to get a ballpark estimate of a project's price, to compare APIs, or for precise monitoring of usage!

✨ Daily reminder: 𝗿𝘂𝗻𝗻𝗶𝗻𝗴 𝗮𝗻 𝗔𝟭𝟬𝟬 𝗰𝗼𝘀𝘁𝘀 𝘆𝗼𝘂 𝗲𝘅𝗮𝗰𝘁𝗹𝘆 $𝟬.𝟬𝟬/𝗵𝗼𝘂𝗿 (or 0.00€ in current exchange rates) on a HF space with ZeroGPU!
Learn more on ZeroGPU 👉 https://www.datacenterdynamics.com/en/news/hugging-face-launches-zerogpu-project-to-democratize-ai-gives-away-10-million-worth-of-compute/

5 replies

·

Reacted to flozi00's post with ❤️ 5 months ago

Post

1843

🌟 Progress in the German FineWeb edu reproduction 🌟

We're delighted to share the launch of our new Data Quality Classification Model, designed specifically for evaluating educational content in German. This tool uses advanced machine learning techniques to assess texts across all educational levels, from primary school to university.

🔍 Inspired by Huggingface's fine web edu dataset, we've worked hard to refine data classification methods ensuring educators and learners access top-quality resources.
We're excited about the future as we continue improving our models and expanding our datasets.

Access the model here: pL-Community/GermanEduScorer-Qwen2-1.5b

🙏 A huge thank you to David and Daryoush from Vago Solutions; Björn and Jan from Ellamind / DiscoResearch for their expert insights throughout this project. Your support has been crucial.
This project was made possible by the support of PrimeLine AI.

1 reply

·

replied to dvilasuero's post 5 months ago

Congrats and welcome to the team!

Reacted to dvilasuero's post with 🚀🔥 5 months ago

Post

7947

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

·

Reacted to Xenova's post with 🚀🔥 6 months ago

Post

10172

Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! 🚀 It supports multilingual transcription and translation across 100 languages! 🤯

The model runs locally, meaning no data leaves your device! 😍

Check it out! 👇
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu

7 replies

·

Reacted to tomaarsen's post with ❤️ 6 months ago

Post

3299

I just published Sentence Transformers v3.0.1: the first patch release since v3 from last week. It introduces gradient checkpointing, pushing model checkpoints to Hugging Face while training, model card improvements and fixes. Details:

1️⃣ Gradient checkpointing allows for much less memory usage at a cost of ~20% training speed. Seems to allow for higher batch sizes, which is quite important for loss functions with in-batch negatives.
2️⃣ You can specify args.push_to_hub=True and args.hub_model_id to upload your model checkpoints to Hugging Face while training. It also uploads your emissions (if codecarbon is installed) and your Tensorboard logs (if tensorboard is installed)
3️⃣ Model card improvements: improved automatic widget examples, better tags, and the default of "sentence_transformers_model_id" now gets replaced when possible.
4️⃣ Several evaluator fixes, see release notes for details.
5️⃣ Fixed a bug with MatryoshkaLoss throwing an error if the supplied Matryoshka dimensions are ascending instead of descending.
6️⃣ Full Safetensors support; even the uncommon modules can now save and load "model.safetensors" files: no more pickle risks.

Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.0.1

And let me know what kind of features you'd like to see next! I have some plans already (ONNX, Sparse models, ColBERT, PEFT), but I don't yet know how I should prioritize everything.

3 replies

·

Reacted to lucifertrj's post with 🔥 6 months ago

Post

1846

Evaluate RAG using Open Source from HuggingFace using BeyondLLM

# pip install beyondllm
# pip install huggingface_hub
# pip install llama-index-embeddings-fastembed

from beyondllm.source import fit
from beyondllm.embeddings import FastEmbedEmbeddings
from beyondllm.retrieve import auto_retriever
from beyondllm.llms import HuggingFaceHubModel
from beyondllm.generator import Generate

import os
from getpass import getpass
os.environ['HUGGINGFACE_ACCESS_TOKEN'] = getpass("Enter your HF API token:")

data = fit("RedHenLab_GSoC_Tarun.pdf",dtype="pdf")
embed_model = FastEmbedEmbeddings()
retriever = auto_retriever(data=data,embed_model=embed_model,type="normal",top_k=3)
llm = HuggingFaceHubModel(model="mistralai/Mistral-7B-Instruct-v0.2")
pipeline = Generate(question="what models has Tarun fine-tuned?",llm=llm,retriever=retriever)

print(pipeline.call()) # Return the AI response
print(pipeline.get_rag_triad_evals())

GitHub: https://github.com/aiplanethub/beyondllm

Don't forget to ⭐️ the repo

4 replies

·

Radamés Ajna

AI & ML interests

Recent Activity

Articles

Hugging Face + Google Visual Blocks

Organizations

radames's activity