Kuldeep Singh Sidhu
singhsidhukuldeep
AI & ML interests
😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io
Recent Activity
posted
an
update
about 15 hours ago
Exciting breakthrough in multimodal search technology! @nvidia researchers have developed MM-Embed, a groundbreaking universal multimodal retrieval system that's changing how we think about search.
Key innovations:
• First-ever universal multimodal retriever that excels at both text and image searches across diverse tasks
• Leverages advanced multimodal LLMs to understand complex queries combining text and images
• Implements novel modality-aware hard negative mining to overcome modality bias issues
• Achieves state-of-the-art performance on M-BEIR benchmark while maintaining superior text retrieval capabilities
Under the hood:
The system uses a sophisticated bi-encoder architecture with LLaVa-Next (based on Mistral 7B) as its backbone. It employs a unique two-stage training approach: first with random negatives, then with carefully mined hard negatives to improve cross-modal understanding.
The real magic happens in the modality-aware negative mining, where the system learns to distinguish between incorrect modality matches and unsatisfactory information matches, ensuring retrieved results match both content and format requirements.
What sets it apart is its ability to handle diverse search scenarios - from simple text queries to complex combinations of images and text, all while maintaining high accuracy across different domains
posted
an
update
2 days ago
Excited to share @LinkedIn 's innovative approach to evaluating semantic search quality! As part of the Search AI team, we've developed a groundbreaking evaluation pipeline that revolutionizes how we measure search relevance.
>> Key Innovation: On-Topic Rate (OTR)
This novel metric measures the semantic match between queries and search results, going beyond simple keyword matching. The system evaluates whether content is truly relevant to the query's intent, not just matching surface-level terms.
>> Technical Implementation Details
Query Set Construction
• Golden Set: Contains curated top queries and complex topical queries
• Open Set: Includes trending queries and random production queries for diversity
Evaluation Pipeline Architecture
1. Query Processing:
- Retrieves top 10 documents per query
- Extracts post text and article information
- Processes both primary content and reshared materials
2. GAI Integration:
- Leverages GPT-3.5 with specialized prompts
- Produces three key outputs:
- Binary relevance decision
- Relevance score (0-1 range)
- Decision reasoning
Quality Assurance
• Validation achieved 94.5% accuracy on a test set of 600 query-post pairs
• Human evaluation showed 81.72% consistency with expert annotators
>> Business Impact
This system now serves as LinkedIn's benchmark for content search experiments, enabling:
• Weekly performance monitoring
• Rapid offline testing of new ML models
• Systematic identification of improvement opportunities
What are your thoughts on semantic search evaluation?
posted
an
update
5 days ago
Good folks ask Google have released a paper on CAT4D, a cutting-edge framework that's pushing the boundaries of multi-view video generation. Probably coming to Google Photos near you!
This innovative approach introduces a novel way to create dynamic 4D content with unprecedented control and quality.
Key Technical Innovations:
- Multi-View Video Diffusion Model (MVVM) architecture that handles both spatial and temporal dimensions simultaneously
- Zero-shot text-to-4D generation pipeline
- Temporal-aware attention mechanisms for consistent motion synthesis
- View-consistent generation across multiple camera angles
Technical Deep Dive:
The framework employs a sophisticated cascade of diffusion models that work in harmony to generate consistent content across both space and time. The architecture leverages view-dependent rendering techniques while maintaining temporal coherence through specialized attention mechanisms.
What sets CAT4D apart:
- Real-time view synthesis capabilities
- Seamless integration of temporal and spatial information
- Advanced motion handling through specialized temporal encoders
- Robust view consistency preservation across generated frames
Thoughts on how this could transform content creation in your industry?
Organizations
singhsidhukuldeep's activity
Update Request
2
#2 opened 16 days ago
by
singhsidhukuldeep
The model can be started using vllm, but no dialogue is possible.
3
#2 opened 4 months ago
by
SongXiaoMao
Adding chat_template to tokenizer_config.json file
1
#3 opened 4 months ago
by
singhsidhukuldeep
Script request
3
#1 opened 4 months ago
by
singhsidhukuldeep
Requesting script
#1 opened 4 months ago
by
singhsidhukuldeep