For anyone looking to boost their LLM fine-tuning and alignment skills this decemeber. We're running this free and open course called smol course. It’s not big like Li Yin and @mlabonne, it’s just smol.
👷 It focuses on practical use cases, so if you’re working on something, bring it along.
👯♀️ It’s peer reviewed and open so you can discuss and get feedback.
🤘 If you’re already a smol pro, feel free to drop a star or issue.
> > Part 1 starts now, and it’s on instruction tuning!
I get many questions about the radically different LLM technology that I started to develop 2 years ago. Initially designed to retrieve information that I could no longer find on the Internet, not with search, OpenAI, Gemini, Perplexity or any other platform, it evolved to become the ideal solution for professional enterprise users. Now agentic and multimodal, automating business tasks at scale with lightning speed, consistently delivering real ROI, bypassing the costs associated to training and GPU with zero weight and explainable AI, tested and developed for Fortune 100 company.
So, what is behind the scenes, how different is it compared to LLM 1.0 (GPT and the likes), how can it be hallucination-free, what makes it a game changer, how did it eliminate prompt engineering, how does it handle knowledge graphs without neural networks, and what are the other benefits?
In a nutshell, the performance is due to building a robust architecture from the ground up and at every step, offering far more than a prompt box, relying on home-made technology rather than faulty Python libraries, and designed by enterprise and tech visionaries for enterprise users.
Contextual smart crawling to retrieve underlying taxonomies, augmented taxonomies, long contextual multi-tokens, real-time fine-tunning, increased security, LLM router with specialized sub-LLMs, an in-memory database architecture of its own to efficiently handle sparsity in keyword associations, contextual backend tables, agents built on the backend, mapping between prompt and corpus keywords, customized PMI rather than cosine similarity, variable-length embeddings, and the scoring engine (the new “PageRank” of LLMs) returning results along with the relevancy scores, are but a few of the differentiators.
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI - A big company will see its market cap divided by two or more because of AI - At least 100,000 personal AI robots will be pre-ordered - China will start to lead the AI race (as a consequence of leading the open-source AI race). - There will be big breakthroughs in AI for biology and chemistry. - We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price ✅ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs ✅ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry ✅ for video 🔴for time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI ✅Monetary 🔴Environmental (😢)
- A popular media will be mostly AI-generated ✅ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment 🔜currently 7M of AI builders on Hugging Face
2 replies
·
reacted to akhaliq's
post with ❤️about 19 hours ago
🤖 𝗔𝗱𝗼𝗯𝗲'𝘀 𝗰𝗼𝗱𝗲-𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝘁𝗵𝗲 𝘁𝗼𝗽 𝗼𝗳 𝗚𝗔𝗜𝗔 𝗹𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 - and their paper cites my work!
💡 Reminder: In short, Agentic systems are a vehicle in which you put your LLM to allow it access to the outside world.
➡️ The team of researchers at Adobe started from the idea that current agentic systems lack the ability to define their own tools. So they decided to make an agent that writes actions as code, thus allowing it to write python functions that can be re-used later as tools!
Here's what the LLM generations can look like with the proper prompt:
Thought: I need to access the excel file using a different method. Action:
defaccess_excel_file(file_path)
... # rest of the code (the agent does writes it, but I don't have room in this post)return rows
Then your system executes this and appends the observation to the agent's memory.
Why is this code formulation better than classical tool use formulation as JSON? The paper explains:
"Most existing work uses text or JSON as the representation of actions, which significantly lacks the two criteria mentioned earlier: generality and composability. In contrast, DynaSaur can utilize available actions or create new ones if necessary, using code as a unified representation. In principle, acting with code enables agents to solve any Turing-complete problem."
The idea of using code is not new: in fact, we do it in transformers.agents (thus the citation that I got). They implementation adds further refinements, like using RAG to retrieve relevant functions before generating an action, which increases performance further.
And they observe that code agents perform much better, reaching the top of GAIA leaderboard! 🥇
Go take a look, it's really clear and informative!
small but mighty 🔥 you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨ I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb