Ali El Filali

alielfilali01

AI & ML interests

"AI Psychometrician" | NLP (mainly for Arabic) | Other interests include Reinforcement Learning and Cognitive sciences among others

Recent Activity

upvoted an article about 3 hours ago

Articles

Organizations

alielfilali01's activity

reacted to LukeNeumann's post with ๐Ÿคฏ 1 day ago
view post
Post
1054
Nine years ago, I uploaded the first 8K resolution video to YouTube and I've been stockpiling 8K footage ever since: https://www.youtube.com/watch?v=sLprVF6d7Ug&t

Should @Overlaiapp release the first open-source 8K video dataset?

Could anyone even fine tune a model with this?๐Ÿ˜…
ยท
reacted to monsoon-nlp's post with โค๏ธ 1 day ago
view post
Post
1045
Great to see Tatta Bio release an embeddings version of their DNA/protein language model ๐Ÿงฌ: tattabio/gLM2_650M_embed
reacted to m-ric's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1280
Great feature alert: ๐—ฌ๐—ผ๐˜‚ ๐—ฐ๐—ฎ๐—ป ๐—ป๐—ผ๐˜„ ๐˜‚๐˜€๐—ฒ ๐—ฎ๐—ป๐˜† ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ ๐—ฎ๐˜€ ๐—ฎ ๐˜๐—ผ๐—ผ๐—น ๐—ณ๐—ผ๐—ฟ ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€.๐—ฎ๐—ด๐—ฒ๐—ป๐˜! ๐Ÿ› ๏ธ๐Ÿ”ฅ๐Ÿ”ฅ

This lets you take the coolest spaces, like FLUX.1-dev, and use them in agentic workflows with a few lines of code! ๐Ÿง‘โ€๐Ÿ’ป

On the video below, I set up my fake vacation pictures where I'm awesome at surfing (I'm really not) ๐Ÿ„

Head to the doc to learn this magic ๐Ÿ‘‰ https://huggingface.co/docs/transformers/main/en/agents_advanced#import-a-space-as-a-tool-
reacted to maxiw's post with โค๏ธ 6 days ago
view post
Post
4485
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
โค๏ธ: 7048 times
๐Ÿ”ฅ: 5921 times
๐Ÿ‘: 4856 times
๐Ÿš€: 2549 times
๐Ÿค—: 2065 times
ยท
reacted to fffiloni's post with ๐Ÿ”ฅ 6 days ago
posted an update 9 days ago
view post
Post
2041
Unpopular opinion : o1-preview is more stupid than 4o and Qwen2.5-72B-Instruct in extremely underrated !
  • 2 replies
ยท
reacted to nroggendorff's post with ๐Ÿ‘€ 14 days ago
view post
Post
2198
I still think whitespace in tokenizers are so dumb.
Congrats, you just doubled your vocab size for no reason.
  • 3 replies
ยท
reacted to fdaudens's post with โค๏ธโž• 15 days ago
view post
Post
2358
Just tested Argilla's new data annotation feature - it's a game changer for AI project quality.

Upload CSVs, work with published datasets, or improve existing ones directly on HuggingFace Hub. Setup took < 2 minutes, no code needed (see example below where I selected a dataset to classify tweets in categories).

Real world impact: Missing in Chicago won a Pulitzer using a similar approach - 200 volunteers labeled police misconduct files to train their model. That's the power of good data annotation.

Three immediate use cases I see:
- Build collaborative training sets with your community (surprisingly underused in AI journalism)
- Turn your website chatbot logs into high-quality fine-tuning data
- Compare generated vs published content (great for SEO headlines)

Works for solo projects or teams up to 100 people. All integrated with HuggingFace Hub for immediate model training.

Interesting to see tools like this making data quality more accessible. Data quality is the hidden driver of AI success that we don't talk about enough.

- Check out the blogpost: https://huggingface.co/blog/argilla-ui-hub
- And the quickstart guide: https://docs.argilla.io/latest/getting_started/quickstart/

reacted to albertvillanova's post with โค๏ธ๐Ÿš€ 22 days ago
view post
Post
3089
๐Ÿš€ Exciting update! You can now compare multiple models side-by-side with the Hugging Face Open LLM Comparator! ๐Ÿ“Š

open-llm-leaderboard/comparator

Dive into multi-model evaluations, pinpoint the best model for your needs, and explore insights across top open LLMs all in one place. Ready to level up your model comparison game?
reacted to yjernite's post with โค๏ธ 24 days ago
view post
Post
๐Ÿ‘ท๐Ÿฝโ€โ™€๏ธ๐Ÿ“š๐Ÿ”จ Announcing the Foundation Model Development Cheatsheet!

My first ๐Ÿค—Post๐Ÿค— ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
๐Ÿง‘๐Ÿผโ€๐Ÿคโ€๐Ÿง‘๐Ÿผ data selection, curation, and governance;
๐Ÿ“– accurate and limitations-aware documentation;
โšก energy efficiency throughout the training phase;
๐Ÿ“Š thorough capability assessments and risk evaluations;
๐ŸŒ environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice ๐Ÿค—

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha ๐Ÿ”ฅ
  • 1 reply
ยท
reacted to fdaudens's post with โค๏ธ 27 days ago
view post
Post
2782
๐Ÿคฏ Plot twist: Size isn't everything in AI! A lean 32B parameter model just showed up to the party and outperformed a 70B one. Efficiency > Scale? The AI world just got more interesting...

Cohere For AI released Aya Expanse, a new family of multilingual models (8B and 32B) spanning 23 popular languages.

Models: CohereForAI/c4ai-aya-expanse-671a83d6b2c07c692beab3c3
Blog post: https://huggingface.co/blog/aya-expanse
Demo: CohereForAI/aya_expanse
reacted to clem's post with ๐Ÿ”ฅ 27 days ago
view post
Post
4290
This is no Woodstock AI but will be fun nonetheless haha. Iโ€™ll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.

1,000 spots available first-come first serve with some surprises during the stream!

You can register and add to your calendar here: https://streamyard.com/watch/JS2jHsUP3NDM
ยท
reacted to abhishek's post with ๐Ÿค— about 1 month ago
replied to their post about 1 month ago
view reply

I don't think i totally follow up what you are saying !?

posted an update about 1 month ago
view post
Post
1619
I feel like this incredible resource hasn't gotten the attention it deserves in the community!

@clefourrier and generally the HuggingFace evaluation team put together a fantastic guidebook covering a lot about ๐—˜๐—ฉ๐—”๐—Ÿ๐—จ๐—”๐—ง๐—œ๐—ข๐—ก from basics to advanced tips.

link : https://github.com/huggingface/evaluation-guidebook

I havenโ€™t finished it yet, but i'am enjoying every piece of it so far. Huge thanks @clefourrier and the team for this invaluable resource !
  • 3 replies
ยท
reacted to mattmdjaga's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
1413
๐Ÿšจ New Agent Benchmark ๐Ÿšจ
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)
reacted to singhsidhukuldeep's post with ๐Ÿ‘€ about 1 month ago
view post
Post
2008
Just started going through the latest "State of AI Report 2024", and I cannot get over the predictions!

The report predicts major developments in AI over the next 12 months, including a $10B+ investment from a sovereign state into a large US AI lab, triggering national security scrutiny, and a viral app created by someone without coding skills.

It forecasts changes in data collection practices due to frontier labs facing trials, softer-than-expected EU AI Act implementations, and the rise of an open-source alternative to OpenAI GPT-4 outperforming in benchmarks.

NVIDIAโ€™s dominance will remain largely unchallenged, investment in humanoid robots will decline, Appleโ€™s on-device AI research will gain momentum, and a research paper by an AI scientist will be accepted at a major conference.

Lastly, a GenAI-based video game is expected to achieve breakout success.

Yet to go through all 200+ pages... will post summarized thoughts later.
  • 2 replies
ยท
reacted to mervenoyan's post with ๐Ÿ”ฅ about 1 month ago