Kuldeep Singh Sidhu

singhsidhukuldeep

https://singhsidhukuldeep.github.io

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update about 15 hours ago

Exciting breakthrough in multimodal search technology! @nvidia researchers have developed MM-Embed, a groundbreaking universal multimodal retrieval system that's changing how we think about search. Key innovations: • First-ever universal multimodal retriever that excels at both text and image searches across diverse tasks • Leverages advanced multimodal LLMs to understand complex queries combining text and images • Implements novel modality-aware hard negative mining to overcome modality bias issues • Achieves state-of-the-art performance on M-BEIR benchmark while maintaining superior text retrieval capabilities Under the hood: The system uses a sophisticated bi-encoder architecture with LLaVa-Next (based on Mistral 7B) as its backbone. It employs a unique two-stage training approach: first with random negatives, then with carefully mined hard negatives to improve cross-modal understanding. The real magic happens in the modality-aware negative mining, where the system learns to distinguish between incorrect modality matches and unsatisfactory information matches, ensuring retrieved results match both content and format requirements. What sets it apart is its ability to handle diverse search scenarios - from simple text queries to complex combinations of images and text, all while maintaining high accuracy across different domains

posted an update 2 days ago

Excited to share @LinkedIn 's innovative approach to evaluating semantic search quality! As part of the Search AI team, we've developed a groundbreaking evaluation pipeline that revolutionizes how we measure search relevance. >> Key Innovation: On-Topic Rate (OTR) This novel metric measures the semantic match between queries and search results, going beyond simple keyword matching. The system evaluates whether content is truly relevant to the query's intent, not just matching surface-level terms. >> Technical Implementation Details Query Set Construction • Golden Set: Contains curated top queries and complex topical queries • Open Set: Includes trending queries and random production queries for diversity Evaluation Pipeline Architecture 1. Query Processing: - Retrieves top 10 documents per query - Extracts post text and article information - Processes both primary content and reshared materials 2. GAI Integration: - Leverages GPT-3.5 with specialized prompts - Produces three key outputs: - Binary relevance decision - Relevance score (0-1 range) - Decision reasoning Quality Assurance • Validation achieved 94.5% accuracy on a test set of 600 query-post pairs • Human evaluation showed 81.72% consistency with expert annotators >> Business Impact This system now serves as LinkedIn's benchmark for content search experiments, enabling: • Weekly performance monitoring • Rapid offline testing of new ML models • Systematic identification of improvement opportunities What are your thoughts on semantic search evaluation?

posted an update 5 days ago

Good folks ask Google have released a paper on CAT4D, a cutting-edge framework that's pushing the boundaries of multi-view video generation. Probably coming to Google Photos near you! This innovative approach introduces a novel way to create dynamic 4D content with unprecedented control and quality. Key Technical Innovations: - Multi-View Video Diffusion Model (MVVM) architecture that handles both spatial and temporal dimensions simultaneously - Zero-shot text-to-4D generation pipeline - Temporal-aware attention mechanisms for consistent motion synthesis - View-consistent generation across multiple camera angles Technical Deep Dive: The framework employs a sophisticated cascade of diffusion models that work in harmony to generate consistent content across both space and time. The architecture leverages view-dependent rendering techniques while maintaining temporal coherence through specialized attention mechanisms. What sets CAT4D apart: - Real-time view synthesis capabilities - Seamless integration of temporal and spatial information - Advanced motion handling through specialized temporal encoders - Robust view consistency preservation across generated frames Thoughts on how this could transform content creation in your industry?

View all activity

Organizations

singhsidhukuldeep's activity

New activity in maxiw/hf-posts 16 days ago

Update Request

#2 opened 16 days ago by

singhsidhukuldeep

New activity in TechxGenus/Mistral-Large-Instruct-2407-AWQ 4 months ago

The model can be started using vllm, but no dialogue is possible.

#2 opened 4 months ago by

SongXiaoMao

Adding chat_template to tokenizer_config.json file

#3 opened 4 months ago by

singhsidhukuldeep

Script request

#1 opened 4 months ago by

singhsidhukuldeep

New activity in casperhansen/mistral-large-instruct-2407-awq 4 months ago

Requesting script

#1 opened 4 months ago by

singhsidhukuldeep

New activity in open-llm-leaderboard/open_llm_leaderboard 4 months ago

Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`

#858 opened 4 months ago by

singhsidhukuldeep