Canary or Whisper 1 ?

#1
by awacke1 - opened

Back six months ago whisper was my new favorite for real time speech recognition because:

  1. It could produce a result fast. Whisper small Eng was the fastest and accuracy high tradeoff. This was best until recently (say March 2024) when Canary proved it was faster and more accurate yet still at size of GPU support requiring T4.
  2. In May GPT-4o dropped 5/12 and in the cookbook openai showed to use whisper until the next voice model comes out.
    In May while there is a Mac M1 GPT4 app which has functional voice, etc, it still seems there is parts unreleased.
  3. With the Video code in cookbook OpenAI shows that you get the transcript first using whisper-1 then give transcript to video AI which also gets image slices as png (once every 1-2 seconds or you can specify timespan. It still seems there is a limit on these since I was able to include only up to ten image slices while using azure OpenAI services.
  4. With GPT-4O expense rate at around $10/day it is expensive to refill each day - still looking around at better ways to do multimodal.

Given the latest multimodal is using whisper-1 I am inclined to believe its the best out currently for high fidelity real time speech transcription and integration with other modalities.

Owner

acw

Owner

acw

Owner

Sign up or log in to comment