WillHeld's picture
Update README.md
5de96c5 verified

Model Card for Diva Llama 3

This is an ablation of our Distilled Voice Assistant (DiVA) model which can handle speech and text as inputs. This ablation is trained using only distillation loss as described in the ablations here: https://huggingface.co/papers/2410.02678

Weights and Biases Run: https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/8i1dd47i?nw=nwuserheld

Citation

This is the distillation only model from https://huggingface.co/papers/2410.02678: BibTeX:

    @misc{held2024diva,
      author="Held, Will and Zhang, Yanzhe and Ryan, Michael and Shi, Weiyan and Li, Ella and Yang, Diyi",
      title="Distilling an End-to-End Voice Assistant from Speech Recognition Data",
      year="2024",
      publisher="HuggingFace",
    }
    

Table of Contents

Training Details

Training Data

This model was trained on the CommonVoice corpus.

Training Procedure

This model was trained for 7k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-5 to zero, with a linear warmup of 70 steps.

Environmental Impact

  • Hardware Type: V4-32 TPU
  • Hours used: 8 Hours
  • Cloud Provider: Google Cloud.
  • Compute Region: US Central C

Hardware

This model was trained on at V4 TPU on Google Cloud.

Software

This model was trained with Levanter

Model Card Authors [optional]

Will Held

Model Card Contact

[email protected]