We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🤯 Let's take a look:
🔀 Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
👁️ Qwen2-VL from Qwen for dynamic-resolution image understanding
🔢 JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
🌋 LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
🤸‍♀️ ViTPose for pose estimation
📄 MGP-STR for optical character recognition (OCR)
📈 PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! 🔥 Huge for privacy!

Check out the release notes for more information. 👇
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU

Reacted to PLB's post with 🚀 14 days ago

Post

1844

⚠️ People selling AI chatbots for websites hate us.
Add an open source chat assistant on your website in 5 minutes: https://github.com/phospho-app/ai-chat-bubble

How does it work ?
- You give an URL
- The AI assistant crawls the website content and embed it
- Add it to your frontend in one line of code
- People on your website can ask the assistant questions

Powered by BAAI/bge-small-en-v1.5 and Mistral AI

5 replies

Reacted to fdaudens's post with 👍 15 days ago

Post

1834

Been reading about the "bigger models = better AI" narrative getting pushed back today.

@thomwolf tackled this head on at Web Summit and highlighted how important small models are (and why closed-source companies haven't pushed for this 😬). They're crushing it: today's 1B parameter models outperform last year's 10B models.

Fascinating to hear him talk about the secret sauce behind this approach.

upvoted a collection 16 days ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated about 18 hours ago • 392

liked a model 29 days ago

Kortix/FastApply-1.5B-v1.0

Text Generation • Updated Oct 27 • 370 • 15

Reacted to fdaudens's post with 👍 30 days ago

Post

2272

🔍 NYT leveraged AI to investigate election interference by analyzing 400+ hours of recorded meetings - that's 5M words of data!

AI spotted patterns, humans verified facts. Every AI-flagged quote was manually verified against source recordings. Really appreciate that they published their full methodology - transparency matters when using AI in journalism.

A perfect blend of tech & journalism.

The future of journalism isn't robots replacing reporters - it's AI helping humans process massive datasets more efficiently. Sometimes the most powerful tech solutions are the least flashy ones.

Read the article: https://www.nytimes.com/interactive/2024/10/28/us/politics/inside-the-movement-behind-trumps-election-lies.html?unlocked_article_code=1.Vk4.ucv9.dbHVquTQaf0G&smid=nytcore-ios-share

Reacted to m-ric's post with 👀 about 1 month ago

Post

860

𝗛𝗼𝘄 𝘁𝗼 𝗿𝗲-𝗿𝗮𝗻𝗸 𝘆𝗼𝘂𝗿 𝘀𝗻𝗶𝗽𝗽𝗲𝘁𝘀 𝗶𝗻 𝗥𝗔𝗚 ⇒ ColBERT, Rerankers, Cross-Encoders

Let’s say you’re doing RAG, and in an effort to improve performance, you try to rerank a few possible source snippets by their relevancy to a query.

How can you score similarity between your query and any source document? 🤔 📄 ↔️ 📑

𝟭. 𝗝𝘂𝘀𝘁 𝘂𝘀𝗲 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 : 𝗡𝗼-𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻 🏎️

This means that you encode each token from both the query and the doc as separate vectors, then average the tokens of each separately to get in total 2 vectors, then you compute similarity via cosine or something.
➡️ Notable examples: Check the top of the MTEB leaderboard!

𝟮. 𝗟𝗮𝘁𝗲-𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻: 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗖𝗼𝗹𝗕𝗘𝗥𝗧 🏃

These encode each token from both query and doc as separate vectors as before, but compare all together without previously averaging them and losing information.

This is more accurate than no-interaction but also slower because you have to compare n*m vectors instead of 2. At least you can store documents in memory. And ColBERT has some optimisations like pooling to be faster.

➡️ Notable examples: ColBERTv2, mxbai-colbert-large-v1, jina-colbert-v2

𝟯. 𝗘𝗮𝗿𝗹𝘆 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻: 𝗖𝗿𝗼𝘀𝘀-𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 🏋️

Basically you run the concatenated query + document in a model to get a final score.

This is the most accurate, but also the slowest since it gets really long when you have many docs to rerank! And you cannot pre-store embeddings.

➡️ Notable examples: MixedBread or Jina AI rerankers!

🚀 So what you choose is a trade-off between speed and accuracy: I think ColBERT is often a really good choice!

Based on this great post by Jina AI 👉 https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter

liked a model about 2 months ago

openai/whisper-large-v3-turbo

Automatic Speech Recognition • Updated Oct 4 • 2.11M • • 1.42k

Reacted to m-ric's post with 👀 2 months ago

Post

356

Anthropic just released a chunk improvement technique that vastly improves RAG performance! 🔥

Crash reminder: Retrieval Augmented Generation (RAG) is a widely-used technique for improving your LLM chatbot's answers to user questions.

It goes like this: instead of generating an LLM answer straight away, it just adds a previous step called Retrieval, that retrieves relevant documents from your knowledge base through semantic search, and just appends the top K documents to the prompt. ➡️ As a result, the LLM answer is grounded in context.

⛔️ The difficulty with this retrieval step is that when you split your documents into chunks that will be retrieved, you lose context. So importance chunks could be missed.

💡 Anthropic's just released blog post shows that you can add some context to each chunk, with one LLM call. Then you embed the original chunk + a bit of added context, so that the embedding is much more representative of the document in its context!

🤔 Isn't that crazy expensive? Well it would have been before, but not so much anymore with their new Prompt caching feature that makes duplicating thousands of requests with the same prompt much less expensive. They give an indicative price tag of only $1.02 per million chunks processed!

✅ And this vastly improves performance on their benchmark!

Read their blog post 👉 https://www.anthropic.com/news/contextual-retrieval

Reacted to Tonic's post with 🔥 2 months ago

Post

2722

🙋🏻‍♂️Hey there folks ,

@ucaslcl released a new OCR model , that's👏🏻👏🏻 fantastic : https://huggingface.co/ucaslcl/GOT-OCR2_0

GPU : Tonic/GOT-OCR
Gradio Demo (Image Edit) : Tonic1/ImageEdit-GOT-OCR

Model : https://huggingface.co/ucaslcl/GOT-OCR2_0
Official demo : https://huggingface.co/spaces/ucaslcl/GOT_online
github : https://github.com/Ucas-HaoranWei/GOT-OCR2.0

4 replies

liked a Space 2 months ago

Running on Zero

📊

HTML To Markdown

liked a model 2 months ago

jinaai/reader-lm-1.5b

Text Generation • Updated Sep 20 • 2.46k • 489

Reacted to jeffboudier's post with 🔥 2 months ago

Post

4015

Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up

2 replies

liked a model 3 months ago

gpt-omni/mini-omni

Text-to-Speech • Updated Sep 4 • 2 • 403

Reacted to louisbrulenaudet's post with 👀 3 months ago

Post

1880

Understanding the json format response with HF's Serverless Inference API 🤗

As it stands, there seems to be an inconsistency with the OpenAI documentation on the question of implementing the JSON response format using the InferenceClient completion API.

After investigating the InferenceClient source code, I share the official solution using a JSON Schema. This consolidates the structure of the response and simplifies parsing as part of an automated process for extracting metadata, information:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")

messages = [
    {
        "role": "user",
        "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
    },
]

response_format = {
    "type": "json",
    "value": {
        "properties": {
            "location": {"type": "string"},
            "activity": {"type": "string"},
            "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
            "animals": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["location", "activity", "animals_seen", "animals"],
    },
}

response = client.chat_completion(
    messages=messages,
    response_format=response_format,
    max_tokens=500,
)

print(response.choices[0].message.content)

As a reminder, json mode is activated with the OpenAI client as follows:

response = client.chat.completions.create(
     model="gpt-3.5-turbo-0125",
     messages=[...],
     response_format={"type": "json_object"}
)

One question remains unanswered, however, and will perhaps be answered by the community: it seems that an incompatibility persists for list of dictionaries generation, and currently, the production of simple dictionaries seems to be the only functional option.

1 reply

Reacted to m-ric's post with 👍 3 months ago

Post

2210

🤖 𝗧𝗵𝗲 𝗔𝗜 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁: 𝗔𝗴𝗲𝗻𝘁𝗶𝗰, 𝗳𝘂𝗹𝗹𝘆-𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗳𝗼𝗿 𝘂𝗻𝗱𝗲𝗿 $𝟭𝟱 𝗽𝗲𝗿 𝗽𝗮𝗽𝗲𝗿

Researchers have just created an AI system that 𝗰𝗮𝗻 𝗰𝗼𝗻𝗱𝘂𝗰𝘁 𝗲𝗻𝘁𝗶𝗿𝗲 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀 𝗳𝗿𝗼𝗺 𝘀𝘁𝗮𝗿𝘁 𝘁𝗼 𝗳𝗶𝗻𝗶𝘀𝗵, 𝗽𝗼𝘁𝗲𝗻𝘁𝗶𝗮𝗹𝗹𝘆 𝗿𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝗶𝘇𝗶𝗻𝗴 𝗵𝗼𝘄 𝘀𝗰𝗶𝗲𝗻𝘁𝗶𝗳𝗶𝗰 𝗱𝗶𝘀𝗰𝗼𝘃𝗲𝗿𝗶𝗲𝘀 𝗮𝗿𝗲 𝗺𝗮𝗱𝗲.

It doesn't just assist with specific tasks - it automates the entire research process, from generating ideas to writing and reviewing papers.
1 - brainstorm novel research directions, 2- write and execute code for experiments & visualize results, get references, and even 3- write up findings in a full academic paper format!

And it can do all this for under $15 per paper! 🤯

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🧠 Generates novel research ideas across multiple topics (e.g. diffusion modeling, transformers, learning dynamics aka “grokking”)
👨‍💻 Uses open-source coding assistant Aider to implement ideas and run experiments. This is especially important since this agentic assistant can iterate if it fails somewhere.
📊 Visualizes results and plans follow-up experiments (up to 5 rounds)
✍️ Writes full academic papers, including finding references using Semantic Search API
🕵️ Runs a simulated peer review process to evaluate paper quality
💰 Total cost per paper is under $15. This system can generate "hundreds of interesting, medium-quality papers" in just a week !

𝗦𝘁𝗶𝗹𝗹 𝗻𝗼𝘁 𝗿𝗲𝗮𝗱𝘆 𝘁𝗼 𝗳𝗶𝗹𝗹 𝗜𝗖𝗟𝗥 𝘄𝗶𝘁𝗵 𝗽𝗮𝗽𝗲𝗿𝘀:
🔁 Ideas generated in one domain tend to be repetitive across different runs, and even different language model
👀 Does not use vision capabilities to fix visual issues in plots
💭 Models occasionally hallucinate entire results tables
⇒ Only few of the generated papers would actually meet the threshold for acceptance at a top AI conference

👉 Read their paper: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)

Reacted to macadeliccc's post with 👍 3 months ago

Post

1607

Automated web scraping with playwright is becoming easier by the day. Now, using ollama tool calling, its possible to perform very high accuracy web scraping (in some cases 100% accurate) through just asking an LLM to scrape the content for you.

This can be completed in a multistep process similar to cohere's platform. If you have tried the cohere playground with web scraping, this will feel very similar. In my experience, the Llama 3.1 version is much better due to the larger context window. Both tools are great, but the difference is the ollama + playwright version is completely controlled by you.

All you need to do is wrap your scraper in a function:

async def query_web_scraper(url: str) -> dict:
    scraper = WebScraper(headless=False)
    return await scraper.query_page_content(url)

and then make your request:

# First API call: Send the query and function description to the model
response = ollama.chat(
    model=model,
    messages=messages,
    tools=[
        {
            'type': 'function',
            'function': {
                'name': 'query_web_scraper',
                'description': 'Scrapes the content of a web page and returns the structured JSON object with titles, articles, and associated links.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'url': {
                            'type': 'string',
                            'description': 'The URL of the web page to scrape.',
                        },
                    },
                    'required': ['url'],
                },
            },
        },
    ]
)

To learn more:
Github w/ Playground: https://github.com/tdolan21/tool-calling-playground/blob/main/notebooks/ollama-playwright-web-scraping.ipynb
Complete Guide: https://medium.com/@tdolan21/building-an-llm-powered-web-scraper-with-ollama-and-playwright-6274d5d938b5

liked 3 models 3 months ago