93 7 81

Yanis L PRO

Pendrokar

AI & ML interests

STT/STS/TTS you know, something that is solveable

Recent Activity

updated a dataset 42 minutes ago

Pendrokar/TTS_Arena

New activity about 8 hours ago

Pendrokar/TTS-Spaces-Arena:Empty audio sample (Kokoro)

updated a dataset about 10 hours ago

Pendrokar/TTS_Arena

View all activity

Organizations

Pendrokar's activity

replied to hexgrad's post 3 days ago

The original Arena's threshold is at 700 votes. But I am sure Kokoro will hold the position. The voice quality actually sounds close to ElevenLabs.

But StyleTTS usually is not very emotional. So it will fail where Edge TTS does. The phrases where the voice has to be sad or angry. For example Parler Expresso was overly jolly.

reacted to hexgrad's post with 🔥 3 days ago

Post

2672

self.brag(): Kokoro finally got 300 votes in Pendrokar/TTS-Spaces-Arena after @Pendrokar was kind enough to add it 3 weeks ago.
Discounting the small sample size of votes, I think it is safe to say that hexgrad/Kokoro-TTS is currently a top 3 model among the contenders in that Arena. This is notable because:
- At 82M params, Kokoro is one of the smaller models in the Arena
- MeloTTS has 52M params
- F5 TTS has 330M params
- XTTSv2 has 467M params

5 replies

reacted to hexgrad's post with 👍 6 days ago

Post

1294

@Respair just dropped Tsukasa: frontier TTS in Japanese Respair/Tsukasa_Speech
It's expressive, punches way above its weight class, and supports voice cloning. Go check it out! 🚀
(Unmute the audio sample below after hitting play)

replied to their post 7 days ago

True, a sample from the original dataset would probably be the best. My attempt to try to fetch one from Emilia dataset was unsuccessful as HF dataset viewer can only show the German samples. Emilia's homepage has a ASMR-y example prompt given.

replied to their post 9 days ago

True about the narration style sample, but that still did not stop XTTS in surpassing F5. Both use the same sample.

posted an update 10 days ago

Post

306

TTS: Sorry, I just cannot get the hype behind F5 TTS. It has now gathered a thousand votes in the TTS Arena fork and **has remained in #8 spot** against the _mostly_ Open TTS adversaries.

The voice sample used is the same as XTTS. F5 has so far been unstable, being unemotional/monotone/depressed and mispronouncing words (_awestruck_).

If you have suggestions please give feedback in the following thread:
mrfakename/E2-F5-TTS#32

5 replies

reacted to hexgrad's post with 🔥 17 days ago

Post

3200

Kokoro: a small, fast 80M param TTS model hosted on ZeroGPU at hexgrad/Kokoro-TTS

3 replies

posted an update 19 days ago

Post

857

Added @amphion MaskGCT & @hexgrad StyleTTS fine tuned model by the name of kokoro to the forked TTS Arena Space. If things keep up from what is seen in the preliminary results, then these two may end up in the TOP 5 of all TTS models. 🤞️🍀️

Pendrokar/TTS-Spaces-Arena
Svngoku/maskgct-audio-lab
hexgrad/Kokoro-TTS

I chose @Svngoku 's forked HF space over amphion's due to the overly high ZeroGPU duration demand on the latter. 300s!

amphion/maskgct

Had to remove @mrfakename 's MetaVoice-1B Space from the available models as that space has been down for quite some time. 🤕️

mrfakename/MetaVoice-1B-v0.1

I'm close to syncing the code to the original Arena's code structure. Then I'd like to use ASR in order to validate and create synthetic public datasets from the generated samples. And then make the Arena multilingual, which will surely attract quite the crowd!

1 reply

reacted to mrfakename's post with 👍 about 1 month ago

Post

5024

I just released an unofficial demo for Moonshine ASR!

Moonshine is a fast, efficient, & accurate ASR model released by Useful Sensors. It's designed for on-device inference and licensed under the MIT license!

HF Space (unofficial demo): mrfakename/Moonshine
GitHub repo for Moonshine: https://github.com/usefulsensors/moonshine

replied to their post about 1 month ago

TTS-AGI/TTS-Arena's button for downloading the DB data was available for a short while. The reason for removal must have been the unreviewed user submitted entries within the spokentext table. I've cleaned it up:
https://huggingface.co/datasets/Pendrokar/TTS_Arena_DB

replied to their post about 1 month ago

Link to Spaces fork: https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
Link to Original Arena: https://huggingface.co/spaces/TTS-AGI/TTS-Arena

posted an update about 1 month ago

Post

639

How the 🗣🏆 leaderboard of a merged TTS Arena with the 🤗 Spaces fork would look like. These results are somewhat unreliable as some models have not challenged the other in the list. And the original TTS Arena used only narration type sentences.

2 replies

posted an update about 2 months ago

Post

1370

Made a notable change to the TTS Arena fork. I do not think anyone is interested in which bottomfeeder TTS is better than another beside it. So one of the top 5 TTS is always chosen in a challenge for more scrutiny. Also these top 5 are taken from preliminary results.
Pendrokar/TTS-Spaces-Arena

reacted to mrfakename's post with 👍 6 months ago

Post

8987

Introducing StyleTTS 2 detector, an audio classification model to detect StyleTTS 2 vs human-generated content!

Dual-licensed under MIT/Apache 2.0.

Model Weights: mrfakename/styletts2-detector
Spaces: mrfakename/styletts2-detector

2 replies