Santiago Viquez
santiviquez
AI & ML interests
ML @ NannyML.
Writing "The Little Book of ML Metrics" at https://github.com/NannyML/The-Little-Book-of-ML-Metrics
Recent Activity
posted
an
update
about 2 months ago
Some exciting news...
We are open-sourcing The Little Book of ML Metrics! 🎉
The book that will be on every data scientist's desk is open source.
What does that mean?
It means hundreds of people can review it, contribute to it, and help us improve it before it's finished!
This also means that everyone will have free access to the digital version!
Meanwhile, the high-quality printed edition will be available for purchase as it has been for a while.
Revenue from printed copies will help us support further development and maintenance of the book. Not to mention that reviewers and contributors will receive revenue sharing through their affiliate links. 🙌
Check out the book repo (make sure to leave a star 🌟):
https://github.com/NannyML/The-Little-Book-of-ML-Metrics
Articles
Organizations
santiviquez's activity
posted
an
update
about 2 months ago
Post
1488
Professors should ask students to write blog posts based on their final projects instead of having them do paper-like reports.
A single blog post, accessible to the entire internet, can have a greater career impact than dozens of reports that nobody will read.
A single blog post, accessible to the entire internet, can have a greater career impact than dozens of reports that nobody will read.
posted
an
update
about 2 months ago
Post
465
Some exciting news...
We are open-sourcing The Little Book of ML Metrics! 🎉
The book that will be on every data scientist's desk is open source.
What does that mean?
It means hundreds of people can review it, contribute to it, and help us improve it before it's finished!
This also means that everyone will have free access to the digital version!
Meanwhile, the high-quality printed edition will be available for purchase as it has been for a while.
Revenue from printed copies will help us support further development and maintenance of the book. Not to mention that reviewers and contributors will receive revenue sharing through their affiliate links. 🙌
Check out the book repo (make sure to leave a star 🌟):
https://github.com/NannyML/The-Little-Book-of-ML-Metrics
We are open-sourcing The Little Book of ML Metrics! 🎉
The book that will be on every data scientist's desk is open source.
What does that mean?
It means hundreds of people can review it, contribute to it, and help us improve it before it's finished!
This also means that everyone will have free access to the digital version!
Meanwhile, the high-quality printed edition will be available for purchase as it has been for a while.
Revenue from printed copies will help us support further development and maintenance of the book. Not to mention that reviewers and contributors will receive revenue sharing through their affiliate links. 🙌
Check out the book repo (make sure to leave a star 🌟):
https://github.com/NannyML/The-Little-Book-of-ML-Metrics
replied to
their
post
3 months ago
Exactly. But now, try to do the same, but this time by imagining/drawing an extra dimension perpendicular to the three spatial dimensions we see.
replied to
their
post
4 months ago
Oh thanks! I really appreciate it 🫶
posted
an
update
4 months ago
Post
466
Some personal and professional news ✨
I'm writing a book on ML metrics.
Together with Wojtek Kuberski, we’re creating the missing piece of every ML university program and online course: a book solely dedicated to Machine Learning metrics!
The book will cover the following types of metrics:
• Regression
• Classification
• Clustering
• Ranking
• Vision
• Text
• GenAI
• Bias and Fairness
👉 check out the book: https://www.nannyml.com/metrics
I'm writing a book on ML metrics.
Together with Wojtek Kuberski, we’re creating the missing piece of every ML university program and online course: a book solely dedicated to Machine Learning metrics!
The book will cover the following types of metrics:
• Regression
• Classification
• Clustering
• Ranking
• Vision
• Text
• GenAI
• Bias and Fairness
👉 check out the book: https://www.nannyml.com/metrics
reacted to
dvilasuero's
post with ❤️🔥
6 months ago
Post
8003
Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!
We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.
Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets
After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.
To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.
As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.
Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.
Would love to answer any questions you have so feel free to add them below!
We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.
Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets
After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.
To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.
As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.
Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.
Would love to answer any questions you have so feel free to add them below!
posted
an
update
6 months ago
Post
1044
They: you need ground truth to measure performance! 😠
NannyML: hold my beer...
NannyML: hold my beer...
posted
an
update
6 months ago
Post
949
Just published a new article 😊
https://huggingface.co/blog/santiviquez/data-drift-estimate-model-performance
https://huggingface.co/blog/santiviquez/data-drift-estimate-model-performance
reacted to
lunarflu's
post with 🔥
6 months ago
Post
2311
By popular demand, HF activity tracker v1.0 is here! 📊 let's build it together!🤗
Lots of things to improve, feel free to open PRs in the community tab!
good PR ideas:
- track more types of actions that include date+time
- bigger plot
- track discord activity too 🤯
- link github? ⚡
https://huggingface.co/spaces/huggingface-projects/LevelBot
Lots of things to improve, feel free to open PRs in the community tab!
good PR ideas:
- track more types of actions that include date+time
- bigger plot
- track discord activity too 🤯
- link github? ⚡
https://huggingface.co/spaces/huggingface-projects/LevelBot
posted
an
update
6 months ago
Post
1567
I ran 580 experiments (yes, 580 🤯) to check if we can quantify data drift's impact on model performance using only drift metrics.
For these experiments, I built a technique that relies on drift signals to estimate model performance. I compared its results against the current SoTA performance estimation methods and checked which technique performs best.
The plot below summarizes the general results. It measures the quality of performance estimation versus the absolute performance change. (The lower, the better).
Full experiment: https://www.nannyml.com/blog/data-drift-estimate-model-performance
In it, I describe the setup, datasets, models, benchmarking methods, and the code used in the project.
For these experiments, I built a technique that relies on drift signals to estimate model performance. I compared its results against the current SoTA performance estimation methods and checked which technique performs best.
The plot below summarizes the general results. It measures the quality of performance estimation versus the absolute performance change. (The lower, the better).
Full experiment: https://www.nannyml.com/blog/data-drift-estimate-model-performance
In it, I describe the setup, datasets, models, benchmarking methods, and the code used in the project.
posted
an
update
7 months ago
Post
1570
Looking for someone with +10 years of experience training Deep Kolmogorov-Arnold Networks.
Any suggestions?
Any suggestions?
posted
an
update
8 months ago
Post
2049
More open research updates 🧵
Performance estimation is currently the best way to quantify the impact of data drift on model performance. 💡
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. 👨🔬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share 👀
Performance estimation is currently the best way to quantify the impact of data drift on model performance. 💡
I've been benchmarking performance estimation methods (CBPE and M-CBPE) against data drift signals.
I'm using drift results as features for many regression algorithms, and then I'm taking those to estimate the model's performance. Finally, I'm measuring the Mean Absolute Error (MAE) between the regression models' predictions and actual performance.
So far, for all my experiments, performance estimation methods do better than drift signals. 👨🔬
Bear in mind that these are some early results, I'm running the flow on more datasets as we speak.
Hopefully, by next week, I will have more results to share 👀
posted
an
update
8 months ago
Post
1350
How would you benchmark performance estimation algorithms vs data drift signals?
I'm working on a benchmarking analysis, and I'm currently doing the following:
- Get univariate and multivariate drift signals and measure their correlation with realized performance.
- Use drift signals as features of a regression model to predict the model's performance.
- Use drift signals as features of a classification model to predict a performance drop.
- Compare all the above experiments with results from Performance Estimation algorithms.
Any other ideas?
I'm working on a benchmarking analysis, and I'm currently doing the following:
- Get univariate and multivariate drift signals and measure their correlation with realized performance.
- Use drift signals as features of a regression model to predict the model's performance.
- Use drift signals as features of a classification model to predict a performance drop.
- Compare all the above experiments with results from Performance Estimation algorithms.
Any other ideas?
Nicee, I'll take a look 👀
reacted to
gsarti's
post with ❤️
9 months ago
Post
2212
Our 🐑 PECoRe 🐑 method to detect & attribute context usage in LM generations finally has an official Gradio demo! 🚀
gsarti/pecore
Highlights:
🔍 Context attribution for several decoder-only and encoder-decoder models using convenient presets
🔍 Uses only LM internals to faithfully reflect context usage, no additional detector involved
🔍 Highly parametrizable, export Python & Shell code snippets to run on your machine using 🐛 Inseq CLI (https://github.com/inseq-team/inseq)
Want to use PECoRe for your LMs? Feedback and comments are welcome! 🤗
gsarti/pecore
Highlights:
🔍 Context attribution for several decoder-only and encoder-decoder models using convenient presets
🔍 Uses only LM internals to faithfully reflect context usage, no additional detector involved
🔍 Highly parametrizable, export Python & Shell code snippets to run on your machine using 🐛 Inseq CLI (https://github.com/inseq-team/inseq)
Want to use PECoRe for your LMs? Feedback and comments are welcome! 🤗
posted
an
update
9 months ago
Post
People in Paris 🇫🇷 🥐
Next week we'll be hosting our first Post-Deployment Data Science Meetup in Paris!
My boss will be talking about Quantifying the Impact of Data Drift on Model
Performance. 👀
The event is completely free, and there's only space for 50 people, so if you are interested, RSVP as soon as possible 🤗
🗓️ Thursday, March 14
🕠 5:30 PM - 8:30 PM GMT+1
🔗 RSVP: https://lu.ma/postdeploymentparis
Next week we'll be hosting our first Post-Deployment Data Science Meetup in Paris!
My boss will be talking about Quantifying the Impact of Data Drift on Model
Performance. 👀
The event is completely free, and there's only space for 50 people, so if you are interested, RSVP as soon as possible 🤗
🗓️ Thursday, March 14
🕠 5:30 PM - 8:30 PM GMT+1
🔗 RSVP: https://lu.ma/postdeploymentparis
posted
an
update
9 months ago
Post
Where I work, we are obsessed with what happens to a model's performance after it has been deployed. We call this post-deployment data science.
Let me tell you about a post-deployment data science algorithm that we recently developed to measure the impact of Concept Drift on a model's performance.
How can we detect Concept Drift? 🤔
All ML models are designed to do one thing: learning a probability distribution in the form of P(y|X). In other words, they try to learn how to model an outcome 'y' given the input variables 'X'. 🧠
This probability distribution, P(y|X), is also called Concept. Therefore, if the Concept changes, the model may become invalid.
❓But how do we know if there is a new Concept in our data?
❓Or, more important, how do we measure if the new Concept is affecting the model's performance?
💡 We came up with a clever solution where the main ingredients are a reference dataset, one where the model's performance is known, and a dataset with the latest data we would like to monitor.
👣 Step-by-Step solution:
1️⃣ We start by training an internal model on a chunk of the latest data. ➡️ This allows us to learn the new possible Concept presented in the data.
2️⃣ Next, we use the internal model to make predictions on the reference dataset.
3️⃣ We then estimate the model's performance on the reference dataset, assuming the model's predictions on the monitoring data as ground truth.
4️⃣ If the estimated performance of the internal model and the actual monitored model are very different, we then say that there has been a Concept Drift.
To quantify how this Concept impacts performance, we subtract the actual model's performance on reference from the estimated performance and report a delta of the performance metric. ➡️ This is what the plot below shows. The change of the F1-score due to Concept drift! 🚨
This process is repeated for every new chunk of data that we get. 🔁
Let me tell you about a post-deployment data science algorithm that we recently developed to measure the impact of Concept Drift on a model's performance.
How can we detect Concept Drift? 🤔
All ML models are designed to do one thing: learning a probability distribution in the form of P(y|X). In other words, they try to learn how to model an outcome 'y' given the input variables 'X'. 🧠
This probability distribution, P(y|X), is also called Concept. Therefore, if the Concept changes, the model may become invalid.
❓But how do we know if there is a new Concept in our data?
❓Or, more important, how do we measure if the new Concept is affecting the model's performance?
💡 We came up with a clever solution where the main ingredients are a reference dataset, one where the model's performance is known, and a dataset with the latest data we would like to monitor.
👣 Step-by-Step solution:
1️⃣ We start by training an internal model on a chunk of the latest data. ➡️ This allows us to learn the new possible Concept presented in the data.
2️⃣ Next, we use the internal model to make predictions on the reference dataset.
3️⃣ We then estimate the model's performance on the reference dataset, assuming the model's predictions on the monitoring data as ground truth.
4️⃣ If the estimated performance of the internal model and the actual monitored model are very different, we then say that there has been a Concept Drift.
To quantify how this Concept impacts performance, we subtract the actual model's performance on reference from the estimated performance and report a delta of the performance metric. ➡️ This is what the plot below shows. The change of the F1-score due to Concept drift! 🚨
This process is repeated for every new chunk of data that we get. 🔁