MohamedRashad (Mohamed Rashad)

reacted to their post with 👀 2 months ago

Post

952

Qwen2.5-72B + Flux-dev + FinePersonas = Grounded Structured Character Generator

Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.

Try Here: MohamedRashad/Character-Generator 🤗

posted an update 2 months ago

Post

952

Qwen2.5-72B + Flux-dev + FinePersonas = Grounded Structured Character Generator

Check out my latest projects that uses Qwen/Qwen2.5-72B-Instruct , black-forest-labs/FLUX.1-dev , and MohamedRashad/FinePersonas-Lite to generate different characters in a world of your description.

Try Here: MohamedRashad/Character-Generator 🤗

reacted to reach-vb's post with 🧠 2 months ago

Post

2801

Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! 🔥

The release includes:

1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)

How does Moshi work?

1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.

2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.

3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.

4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.

Model size & inference:

Moshiko/ka are 7.69B param models

bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM

You can run inference via Candle 🦀, PyTorch and MLX - based on your hardware.

The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! 🐐

1 reply

·

reacted to their post with ❤️ 2 months ago

Post

3324

For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer

I hope it helps someone to build something nice and useful with it ^_^

posted an update 2 months ago

Post

3324

For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer

I hope it helps someone to build something nice and useful with it ^_^

reacted to rwightman's post with ❤️ 3 months ago

Post

1277

The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?

replied to rwightman's post 3 months ago

Happy to be of help ^^

reacted to rwightman's post with 🔥 3 months ago

Post

2058

The latest timm validation & test set results are now viewable by a leaderboard space: timm/leaderboard

As of yesterday, I updated all of the results for ImageNet , ImageNet-ReaL, ImageNet-V2, ImageNet-R, ImageNet-A, and Sketch sets. The csv files can be found in the GH repo https://github.com/huggingface/pytorch-image-models/tree/main/results

Unfortunately the latest benchmark csv files are not yet up to date, there are some gaps in dataset results vs throughput/flop numbers impact the plots.

h/t to @MohamedRashad for making the first timm leaderboard.

1 reply

·

reacted to vilarin's post with 🔥 4 months ago

Post

4186

Black Forest Labs, BASED! 👏
FLUX.1 is more delightful, with good instruction following.
FLUX.1 dev( black-forest-labs/FLUX.1-dev) with a 12B parameter distillation model, second only to Black Forest Labs' state-of-the-art model FLUX.1 pro. 🙀

Update 🤙Official demo:
black-forest-labs/FLUX.1-dev

1 reply

·

reacted to their post with ❤️ 4 months ago

Post

2598

Check @lllyasviel Latest research for making videos of still images using hand drawing keyframes:
https://lllyasviel.github.io/pages/paints_undo/

I imported his work to gradio space here
MohamedRashad/PaintsUndo

posted an update 4 months ago

Post

2598

Check @lllyasviel Latest research for making videos of still images using hand drawing keyframes:
https://lllyasviel.github.io/pages/paints_undo/

I imported his work to gradio space here
MohamedRashad/PaintsUndo

reacted to their post with 🔥 5 months ago

Post

2160

Yet Another Whisper Finetune

MohamedRashad/Arabic-Whisper-CodeSwitching-Edition is a finetune of OpenAI Whisper Large V2 on MohamedRashad/arabic-english-code-switching dataset.
This new finetune is capable of recognizing english words in arabic speech and transcribing the foreign words as it is.

Try it out here:
MohamedRashad/Arabic-Whisper-CodeSwitching-Edition

posted an update 5 months ago

Post

2160

Yet Another Whisper Finetune

MohamedRashad/Arabic-Whisper-CodeSwitching-Edition is a finetune of OpenAI Whisper Large V2 on MohamedRashad/arabic-english-code-switching dataset.
This new finetune is capable of recognizing english words in arabic speech and transcribing the foreign words as it is.

Try it out here:
MohamedRashad/Arabic-Whisper-CodeSwitching-Edition

posted an update 5 months ago

Post

806

Just updated MohamedRashad/timm-leaderboard with fuzzy search for people who want to search for a certian vision model

posted an update 6 months ago

Post

1536

Timm Leaderboard space here:
MohamedRashad/timm-leaderboard

Thanks goes to @rwightman for building timm 🤗

posted an update 6 months ago

Post

1218

@Ali-C137 and the team at OALL Just released OALL/Open-Arabic-LLM-Leaderboard

Amazing effort to push the Arabic LLMs development forward 👏

replied to DmitryRyumin's post 6 months ago

No space ?

replied to their post 7 months ago

Arabic ORPO has AWQ and GGUF quantization.
I would recommend AWQ over GGUF because i think there is bugs with llama.cpp with llama3 and may output to you rubbish.

posted an update 7 months ago

Post

1821

For those who love the Arabic language like me, This is a summary of my different models, datasets and spaces i made the last couple of months:

1. MohamedRashad/Arabic-Orpo-Llama-3-8B-Instruct is a finetuned version of Meta-Llama-3-8B-Instruct using ORPO on 2A2I/argilla-dpo-mix-7k-arabic and the space to try it is here MohamedRashad/Arabic-Chatbot-Arena.

2. MohamedRashad/arabic-small-nougat is a finetuned version of facebook/nougat-small on Arabic book pages to be a capable arabic-ocr and its space is also avialable here https://huggingface.co/spaces/MohamedRashad/Arabic-Small-Nougat.

3. There is MohamedRashad/Arabic-CivitAi-Images dataset for text-to-image in the Arabic language (Hope someone utilize it to build something great).

4. MohamedRashad/arabic-sts for those who want to train an Arabic Text Embedder model.

5. Finally, a small arabic dataset about translation from Fusha Arabic to English called MohamedRashad/rasaif-translations (This dataset is very important in my opinion).

2 replies

·

replied to mrm8488's post 7 months ago

Can you share what the training speeds look like ?

Mohamed Rashad PRO

AI & ML interests

Recent Activity

Organizations

MohamedRashad's activity