dlicari (Daniele Licari)

liked a Space about 2 months ago

Running on L40S

1.43k

🙂‍↔️👈

FacePoke

Import a portrait, click to move the head!

liked 3 models 3 months ago

bartowski/Phi-3.5-mini-instruct-GGUF

Text Generation • Updated Sep 15 • 17.3k • 52

next-tat/tat-llm-13b-lora

Updated Feb 23 • 1

dunzhang/stella_en_400M_v5

Reacted to thomwolf's post with ❤️ 8 months ago

Post

4941

A Little guide to building Large Language Models in 2024

This is a post-recording of a 75min lecture I gave two weeks ago on how to train a LLM from scratch in 2024. I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:
* finding, preparing and evaluating web scale data
* understanding model parallelism and efficient training
* fine-tuning/aligning models
* fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):
*datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrove
*nanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotron
*lighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval

Here is the link to watch the lecture on Youtube: https://www.youtube.com/watch?v=2-SPH9hIKT8
And here is the link to the Google slides: https://docs.google.com/presentation/d/1IkzESdOwdmwvPxIELYJi8--K3EZ98_cL6c5ZcLKSyVg/edit#slide=id.p

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

2 replies

·

liked a Space 8 months ago

Running

40

🏢

Universal Ner ITA

Reacted to santiviquez's post with ❤️ 9 months ago

Post

Where I work, we are obsessed with what happens to a model's performance after it has been deployed. We call this post-deployment data science.

Let me tell you about a post-deployment data science algorithm that we recently developed to measure the impact of Concept Drift on a model's performance.

How can we detect Concept Drift? 🤔

All ML models are designed to do one thing: learning a probability distribution in the form of P(y|X). In other words, they try to learn how to model an outcome 'y' given the input variables 'X'. 🧠

This probability distribution, P(y|X), is also called Concept. Therefore, if the Concept changes, the model may become invalid.

❓But how do we know if there is a new Concept in our data?
❓Or, more important, how do we measure if the new Concept is affecting the model's performance?

💡 We came up with a clever solution where the main ingredients are a reference dataset, one where the model's performance is known, and a dataset with the latest data we would like to monitor.

👣 Step-by-Step solution:

1️⃣ We start by training an internal model on a chunk of the latest data. ➡️ This allows us to learn the new possible Concept presented in the data.

2️⃣ Next, we use the internal model to make predictions on the reference dataset.

3️⃣ We then estimate the model's performance on the reference dataset, assuming the model's predictions on the monitoring data as ground truth.

4️⃣ If the estimated performance of the internal model and the actual monitored model are very different, we then say that there has been a Concept Drift.

To quantify how this Concept impacts performance, we subtract the actual model's performance on reference from the estimated performance and report a delta of the performance metric. ➡️ This is what the plot below shows. The change of the F1-score due to Concept drift! 🚨

This process is repeated for every new chunk of data that we get. 🔁