[NEW] WebSearch 2.0 - feedback welcome!

#254
by victor HF staff - opened
Hugging Chat org
β€’
edited Mar 15

image.png

March 15th update: 🌐Internet Access for Assistants

Hi HuggingChat community!

We've just released a big update to the WebSearch feature, it now uses Retrieval-augmented generation (RAG) to extract relevant information from multiple web pages! From our tests it's much more powerful than before πŸš€.

We would love to get your feedback on it! Also, if you want to check the details or even contribute, take a look at the PR on Github.

See you soon!

victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [DRAFT][NEW] - Updated WebSearch - feedback welcome!
victor changed discussion title from [DRAFT][NEW] - Updated WebSearch - feedback welcome! to [NEW] - Updated WebSearch - feedback welcome!
victor pinned discussion
victor changed discussion title from [NEW] - Updated WebSearch - feedback welcome! to [NEW] WebSearch 2.0 - feedback welcome!

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

Hugging Chat org

I think I've asked this elsewhere but I'm not sure what the answer was. Do you use a paid API to query Google search? I'm asking because I can imagine that if it's through something hacky like selenium, Google won't like it (and they'll miss ad revenue). So, in short what does the technical pipeline look like for this from user query to generated output?

You can check out the feature here!

@BramVanroy , they use the SerpAPI which, as far as I understand, is paid and legal. see the source code here: https://github.com/huggingface/chat-ui/blob/main/src/lib/server/websearch/searchWeb.ts

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context (regardless of whether the LLM actually used/refers to the source)? In Bing chat, they somehow managed to attribute sources to specific parts/sentences of the generated output and not only the generated output as a whole. I've always been wondering how they made that work (direct citations / attributing sources to specific sub-parts of a generated output). does anyone know?

Hugging Chat org
β€’
edited Sep 14, 2023

Question regarding sources/citations: Do I understand correctly that you currently display all URLs as sources, which the retriever retrieved and gave the LLM as context

Yes

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

Any chances for a search API that isn't Google's? IE: DuckDuckGo or Bing :)
Awesome work though! Really cool :D

In Bing chat, they somehow managed to attribute sources to specific parts/sentences.

Interesting. I guess one simple way would be: for every generated sentence calculate its similarity against the sources and decide the highest scoring source as the source of that sentence. Not sure if that's what they are doing

@mishig , yeah true. My first intuition was that I would be hesitant to trust embeddings from a bi-encoder to be reliable enough for this. But if you took a cross encoder, that should actually work quite well. especially if you set a high enough threshold to avoid false positives (then it could even work with a bi-encoder sentence transformer). could be a nice feature to have direct citations :)

But how do they send a websearch request as closedai proomts chatgpt to use a websearch request. So is there a specific isntruction the ai uses to search the web, or even call tools for image generation etc. eg (defualt and 3rd party)?

Screenshot 2024-11-09 at 16-11-44 πŸ“Š Poverty research.png

I am getting this error during web search

It's back up sorry about it @acharyaaditya26

Noo, it's not still same error

Hugging Chat org

Hi we're aware of this, trying to get a fix for it, we'll update here!

Hi we're aware of this, trying to get a fix for it, we'll update here!

Thank you @nsarrazin

Hugging Chat org

It's fixed!

It's fixed!

yesssss thank you @victor and @nsarrazin

Sign up or log in to comment