12 17 1

Yi Cui PRO

onekq

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

updated a Space 20 days ago

onekq-ai/WebApp1K-models-leaderboard

posted an update about 1 month ago

October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard Closed sourced models are widening the gap again. Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

New activity about 1 month ago

onekq-ai/WebApp1K-models-leaderboard:All the clickable links are not accessible...

View all activity

Articles

Does Daily Software Engineering Work Need Reasoning Models?

Sep 24

• 5

All LLMs Write Great Code, But Some Make (A Lot) Fewer Mistakes

Sep 12

• 4

Organizations

onekq's activity

posted an update about 1 month ago

Post

559

October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard

Closed sourced models are widening the gap again.

Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.

posted an update about 2 months ago

Post

1841

I'm now working on finetuning of coding models. If you are GPU-hungry like me, you will find quantized models very helpful. But quantization for finetuning and inference are different and incompatible. So I made two collections here.

Inference (GGUF, via Ollama, CPU is enough)
onekq-ai/ollama-ready-coding-models-67118c3cfa1af2cf04a926d6

Finetuning (Bitsandbytes, QLora, GPU is needed)
onekq-ai/qlora-ready-coding-models-67118771ce001b8f4cf946b2

For quantization, the inference models are far more popular on HF than finetuning models. I use https://huggingface.co/QuantFactory to generate inference models (GGUF), and there are a few other choices.

But there hasn't been such a service for finetuning models. DIY isn't too hard though. I made a few myself and you can find the script in the model cards. If the original model is small enough, you can even do it on a free T4 (available via Google Colab).

If you know a (small) coding model worthy of quantization, please let me know and I'd love to add it to the collections.

reacted to fdaudens's post with 🔥 2 months ago

Post

1790

Exciting news in AI: Molmo, a groundbreaking family of open-source multimodal models, has just been announced! 🚀

Key points:
- Closes the gap with proprietary systems on benchmarks & human evals
- Trained on high-quality data (< 1M image-text pairs vs billions)
- Introduces pointing capability for rich interactions
- Fully open weights, data, and training code

The 72B model outperforms several proprietary systems, while the 1B model nearly matches GPT-4V. Small is indeed the new big in AI!

There's an interactive demo available using Molmo-7B-D. Definitely worth checking out to see its capabilities firsthand.

All model weights, data, and code will be released soon. This is a significant step towards truly open, cutting-edge multimodal AI.
The future of AI research and applications is looking brighter than ever! 🤖🖼️

👉 Demo: https://molmo.allenai.org/
👉 Models: allenai/molmo-66f379e6fe3b8ef090a8ca19

#AI #MachineLearning #OpenSource #ComputerVision

reacted to victor's post with 👍🤗 2 months ago

Post

5487

🙋 Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

155 replies

reacted to their post with 🧠 2 months ago

Post

2555

Here is my latest study on OpenAI🍓o1🍓.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.

posted an update 2 months ago

Post

2555

posted an update 3 months ago

Post

424

Announce 🎉 WebApp1K-Duo 🎉
onekq-ai/WebApp1K-Duo-React

This is to keep up the challenge after OpenAI o1 models saturated the WebApp1K benchmark. The new benchmark brings SOTA to 67%. Let the hill climbing commence!
onekq-ai/WebApp1K-models-leaderboard

PS: I will publish more findings soon.

reacted to KingNish's post with 👍 3 months ago

Post

3567

Mistral Nemo is better than many models in 1st grader level reasoning.

replied to zhabotorabi's post 3 months ago

the Mistral API? the model name is probably diffrent. I used mistral-large-2 but had to use the name mistral-large-latest. The team will help you via chat.

posted an update 3 months ago

Post

549

🐋 DeepSeek 🐋2.5 is hands-down the best open-source model, leaving its peers way behind. It even beats GPT-4o mini.

onekq-ai/WebApp1K-models-leaderboard

The inference of the official API is painfully slow though. I heard the team is short on GPUs (well, who isn't).

reacted to aaditya's post with 👍 3 months ago

Post

2555

Last Week in Medical AI: Top Research Papers/Models
🏅(September 7 - September 14, 2024)

🏅 Medical AI Paper of the week
Chai-1 Foundation model molecular structure prediction

Medical LLMs & Benchmarks
- BrainWave: A Brain Signal Foundation Model
- DS-ViT: Vision Transformer for Alzheimer’s Diagnosis
- EyeCLIP: Visual–language model for ophthalmic
- Segment Anything Model for Tumor Segmentation
- MEDIC: Evaluating LLMs in Clinical Applications

Medical LLM Applications
- KARGEN: Radiology Report Generation LLMs
- DrugAgent: Explainable Drug Repurposing Agents
- Improving RAG in Medicine with Follow-up Questions

Frameworks and Methodologies
- Infrastructure for Automatic Cell Segmentation
- Data Alignment for Dermatology AI
- Diagnostic Reasoning in Natural Language
- Two-Stage Instruction Fine-tuning Approach for Med

AI in Healthcare Ethics
- Concerns and Choices of Using LLMs for Healthcare
- Understanding Fairness in Recommender Systems
- Towards Fairer Health Recommendations

Check the full thread: https://x.com/OpenlifesciAI/status/1832476252260712788

Thank you for your continued support and love for this series! Stay up-to-date with weekly updates on Medical LLMs, datasets, and top research papers by following @aaditya 🤗

replied to their post 3 months ago

pass@1 for 🍓o1-mini🍓: 0.94!!

💸💸💸💸

#gpt #o1 #inference #RL #selfplay #WebApp1K

posted an update 3 months ago

Post

1120

If your plan keeps changing it's a sign that you are living the moment.

I just got the pass@1 result of GPT 🍓o1-preview🍓 : 0.95!!!

This means my benchmark is cast into oblivion, I need to up the ante. I am all ears to suggestions. onekq-ai/WebApp1K-models-leaderboard

1 reply