William Suffill

wsuff
ยท

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

wsuff's activity

Reacted to Xenova's post with ๐Ÿš€ about 5 hours ago
view post
Post
263
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! ๐Ÿคฏ Let's take a look:
๐Ÿ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
๐Ÿ‘๏ธ Qwen2-VL from Qwen for dynamic-resolution image understanding
๐Ÿ”ข JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
๐ŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
๐Ÿคธโ€โ™€๏ธ ViTPose for pose estimation
๐Ÿ“„ MGP-STR for optical character recognition (OCR)
๐Ÿ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! ๐Ÿ”ฅ Huge for privacy!

Check out the release notes for more information. ๐Ÿ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
Reacted to PLB's post with ๐Ÿš€ 14 days ago
view post
Post
1844
โš ๏ธ People selling AI chatbots for websites hate us.
Add an open source chat assistant on your website in 5 minutes: https://github.com/phospho-app/ai-chat-bubble

How does it work ?
- You give an URL
- The AI assistant crawls the website content and embed it
- Add it to your frontend in one line of code
- People on your website can ask the assistant questions

Powered by BAAI/bge-small-en-v1.5 and Mistral AI
ยท
Reacted to fdaudens's post with ๐Ÿ‘ 15 days ago
view post
Post
1834
Been reading about the "bigger models = better AI" narrative getting pushed back today.

@thomwolf tackled this head on at Web Summit and highlighted how important small models are (and why closed-source companies haven't pushed for this ๐Ÿ˜ฌ). They're crushing it: today's 1B parameter models outperform last year's 10B models.

Fascinating to hear him talk about the secret sauce behind this approach.
Reacted to fdaudens's post with ๐Ÿ‘ 30 days ago
view post
Post
2272
๐Ÿ” NYT leveraged AI to investigate election interference by analyzing 400+ hours of recorded meetings - that's 5M words of data!

AI spotted patterns, humans verified facts. Every AI-flagged quote was manually verified against source recordings. Really appreciate that they published their full methodology - transparency matters when using AI in journalism.

A perfect blend of tech & journalism.

The future of journalism isn't robots replacing reporters - it's AI helping humans process massive datasets more efficiently. Sometimes the most powerful tech solutions are the least flashy ones.

Read the article: https://www.nytimes.com/interactive/2024/10/28/us/politics/inside-the-movement-behind-trumps-election-lies.html?unlocked_article_code=1.Vk4.ucv9.dbHVquTQaf0G&smid=nytcore-ios-share
Reacted to m-ric's post with ๐Ÿ‘€ about 1 month ago
view post
Post
860
๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—ฟ๐—ฒ-๐—ฟ๐—ฎ๐—ป๐—ธ ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜€๐—ป๐—ถ๐—ฝ๐—ฝ๐—ฒ๐˜๐˜€ ๐—ถ๐—ป ๐—ฅ๐—”๐—š โ‡’ ColBERT, Rerankers, Cross-Encoders

Letโ€™s say youโ€™re doing RAG, and in an effort to improve performance, you try to rerank a few possible source snippets by their relevancy to a query.

How can you score similarity between your query and any source document? ๐Ÿค” ๐Ÿ“„ โ†”๏ธ ๐Ÿ“‘

๐Ÿญ. ๐—๐˜‚๐˜€๐˜ ๐˜‚๐˜€๐—ฒ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด๐˜€ : ๐—ก๐—ผ-๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐ŸŽ๏ธ

This means that you encode each token from both the query and the doc as separate vectors, then average the tokens of each separately to get in total 2 vectors, then you compute similarity via cosine or something.
โžก๏ธ Notable examples: Check the top of the MTEB leaderboard!

๐Ÿฎ. ๐—Ÿ๐—ฎ๐˜๐—ฒ-๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: ๐˜๐—ต๐—ถ๐˜€ ๐—ถ๐˜€ ๐—–๐—ผ๐—น๐—•๐—˜๐—ฅ๐—ง ๐Ÿƒ

These encode each token from both query and doc as separate vectors as before, but compare all together without previously averaging them and losing information.

This is more accurate than no-interaction but also slower because you have to compare n*m vectors instead of 2. At least you can store documents in memory. And ColBERT has some optimisations like pooling to be faster.

โžก๏ธ Notable examples: ColBERTv2, mxbai-colbert-large-v1, jina-colbert-v2

๐Ÿฏ. ๐—˜๐—ฎ๐—ฟ๐—น๐˜† ๐—ถ๐—ป๐˜๐—ฒ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: ๐—–๐—ฟ๐—ผ๐˜€๐˜€-๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐˜€ ๐Ÿ‹๏ธ

Basically you run the concatenated query + document in a model to get a final score.

This is the most accurate, but also the slowest since it gets really long when you have many docs to rerank! And you cannot pre-store embeddings.

โžก๏ธ Notable examples: MixedBread or Jina AI rerankers!

๐Ÿš€ So what you choose is a trade-off between speed and accuracy: I think ColBERT is often a really good choice!

Based on this great post by Jina AI ๐Ÿ‘‰ https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter
Reacted to m-ric's post with ๐Ÿ‘€ 2 months ago
view post
Post
356
Anthropic just released a chunk improvement technique that vastly improves RAG performance! ๐Ÿ”ฅ

Crash reminder: Retrieval Augmented Generation (RAG) is a widely-used technique for improving your LLM chatbot's answers to user questions.

It goes like this: instead of generating an LLM answer straight away, it just adds a previous step called Retrieval, that retrieves relevant documents from your knowledge base through semantic search, and just appends the top K documents to the prompt. โžก๏ธ As a result, the LLM answer is grounded in context.

โ›”๏ธ The difficulty with this retrieval step is that when you split your documents into chunks that will be retrieved, you lose context. So importance chunks could be missed.

๐Ÿ’ก Anthropic's just released blog post shows that you can add some context to each chunk, with one LLM call. Then you embed the original chunk + a bit of added context, so that the embedding is much more representative of the document in its context!

๐Ÿค” Isn't that crazy expensive? Well it would have been before, but not so much anymore with their new Prompt caching feature that makes duplicating thousands of requests with the same prompt much less expensive. They give an indicative price tag of only $1.02 per million chunks processed!

โœ… And this vastly improves performance on their benchmark!

Read their blog post ๐Ÿ‘‰ https://www.anthropic.com/news/contextual-retrieval
Reacted to Tonic's post with ๐Ÿ”ฅ 2 months ago
Reacted to jeffboudier's post with ๐Ÿ”ฅ 2 months ago
view post
Post
4015
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up
  • 2 replies
ยท
Reacted to louisbrulenaudet's post with ๐Ÿ‘€ 3 months ago
view post
Post
1880
Understanding the json format response with HF's Serverless Inference API ๐Ÿค—

As it stands, there seems to be an inconsistency with the OpenAI documentation on the question of implementing the JSON response format using the InferenceClient completion API.

After investigating the InferenceClient source code, I share the official solution using a JSON Schema. This consolidates the structure of the response and simplifies parsing as part of an automated process for extracting metadata, information:
from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")

messages = [
    {
        "role": "user",
        "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
    },
]

response_format = {
    "type": "json",
    "value": {
        "properties": {
            "location": {"type": "string"},
            "activity": {"type": "string"},
            "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
            "animals": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["location", "activity", "animals_seen", "animals"],
    },
}

response = client.chat_completion(
    messages=messages,
    response_format=response_format,
    max_tokens=500,
)

print(response.choices[0].message.content)

As a reminder, json mode is activated with the OpenAI client as follows:
response = client.chat.completions.create(
     model="gpt-3.5-turbo-0125",
     messages=[...],
     response_format={"type": "json_object"}
)

One question remains unanswered, however, and will perhaps be answered by the community: it seems that an incompatibility persists for list of dictionaries generation, and currently, the production of simple dictionaries seems to be the only functional option.
  • 1 reply
ยท
Reacted to m-ric's post with ๐Ÿ‘ 3 months ago
view post
Post
2210
๐Ÿค– ๐—ง๐—ต๐—ฒ ๐—”๐—œ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐˜€๐˜: ๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ, ๐—ณ๐˜‚๐—น๐—น๐˜†-๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฟ $๐Ÿญ๐Ÿฑ ๐—ฝ๐—ฒ๐—ฟ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ

Researchers have just created an AI system that ๐—ฐ๐—ฎ๐—ป ๐—ฐ๐—ผ๐—ป๐—ฑ๐˜‚๐—ฐ๐˜ ๐—ฒ๐—ป๐˜๐—ถ๐—ฟ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜๐˜€ ๐—ณ๐—ฟ๐—ผ๐—บ ๐˜€๐˜๐—ฎ๐—ฟ๐˜ ๐˜๐—ผ ๐—ณ๐—ถ๐—ป๐—ถ๐˜€๐—ต, ๐—ฝ๐—ผ๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ฎ๐—น๐—น๐˜† ๐—ฟ๐—ฒ๐˜ƒ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป๐—ถ๐˜‡๐—ถ๐—ป๐—ด ๐—ต๐—ผ๐˜„ ๐˜€๐—ฐ๐—ถ๐—ฒ๐—ป๐˜๐—ถ๐—ณ๐—ถ๐—ฐ ๐—ฑ๐—ถ๐˜€๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ถ๐—ฒ๐˜€ ๐—ฎ๐—ฟ๐—ฒ ๐—บ๐—ฎ๐—ฑ๐—ฒ.

It doesn't just assist with specific tasks - it automates the entire research process, from generating ideas to writing and reviewing papers.
1 - brainstorm novel research directions, 2- write and execute code for experiments & visualize results, get references, and even 3- write up findings in a full academic paper format!

And it can do all this for under $15 per paper! ๐Ÿคฏ

๐—ž๐—ฒ๐˜† ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
๐Ÿง  Generates novel research ideas across multiple topics (e.g. diffusion modeling, transformers, learning dynamics aka โ€œgrokkingโ€)
๐Ÿ‘จโ€๐Ÿ’ป Uses open-source coding assistant Aider to implement ideas and run experiments. This is especially important since this agentic assistant can iterate if it fails somewhere.
๐Ÿ“Š Visualizes results and plans follow-up experiments (up to 5 rounds)
โœ๏ธ Writes full academic papers, including finding references using Semantic Search API
๐Ÿ•ต๏ธ Runs a simulated peer review process to evaluate paper quality
๐Ÿ’ฐ Total cost per paper is under $15. This system can generate "hundreds of interesting, medium-quality papers" in just a week !

๐—ฆ๐˜๐—ถ๐—น๐—น ๐—ป๐—ผ๐˜ ๐—ฟ๐—ฒ๐—ฎ๐—ฑ๐˜† ๐˜๐—ผ ๐—ณ๐—ถ๐—น๐—น ๐—œ๐—–๐—Ÿ๐—ฅ ๐˜„๐—ถ๐˜๐—ต ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ๐˜€:
๐Ÿ” Ideas generated in one domain tend to be repetitive across different runs, and even different language model
๐Ÿ‘€ Does not use vision capabilities to fix visual issues in plots
๐Ÿ’ญ Models occasionally hallucinate entire results tables
โ‡’ Only few of the generated papers would actually meet the threshold for acceptance at a top AI conference

๐Ÿ‘‰ย Read their paper: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)
Reacted to macadeliccc's post with ๐Ÿ‘ 3 months ago
view post
Post
1607
Automated web scraping with playwright is becoming easier by the day. Now, using ollama tool calling, its possible to perform very high accuracy web scraping (in some cases 100% accurate) through just asking an LLM to scrape the content for you.

This can be completed in a multistep process similar to cohere's platform. If you have tried the cohere playground with web scraping, this will feel very similar. In my experience, the Llama 3.1 version is much better due to the larger context window. Both tools are great, but the difference is the ollama + playwright version is completely controlled by you.

All you need to do is wrap your scraper in a function:

async def query_web_scraper(url: str) -> dict:
    scraper = WebScraper(headless=False)
    return await scraper.query_page_content(url)


and then make your request:

# First API call: Send the query and function description to the model
response = ollama.chat(
    model=model,
    messages=messages,
    tools=[
        {
            'type': 'function',
            'function': {
                'name': 'query_web_scraper',
                'description': 'Scrapes the content of a web page and returns the structured JSON object with titles, articles, and associated links.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'url': {
                            'type': 'string',
                            'description': 'The URL of the web page to scrape.',
                        },
                    },
                    'required': ['url'],
                },
            },
        },
    ]
)


To learn more:
Github w/ Playground: https://github.com/tdolan21/tool-calling-playground/blob/main/notebooks/ollama-playwright-web-scraping.ipynb
Complete Guide: https://medium.com/@tdolan21/building-an-llm-powered-web-scraper-with-ollama-and-playwright-6274d5d938b5