view post Post 5377 Reply Multimodal Ichigo Llama 3.1 - Real Time Voice AI π₯> WhisperSpeech X Llama 3.1 8B> Trained on 50K hours of speech (7 languages)> Continually trained on 45hrs 10x A1000s> MLS -> WhisperVQ tokens -> Llama 3.1> Instruction tuned on 1.89M samples> 70% speech, 20% transcription, 10% text> Apache 2.0 licensed β‘Architecture:> WhisperSpeech/ VQ for Semantic Tokens> Llama 3.1 8B Instruct for Text backbone> Early fusion (Chameleon)I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct) π₯ 16 16 π 5 5 β€οΈ 2 2 π 1 1 π 1 1 π 1 1 +
Running on CPU Upgrade 11.9k π Open LLM Leaderboard 2 Track, rank and evaluate open LLMs and chatbots