Model Card for Diva Llama 3
This is an ablation of our Distilled Voice Assistant (DiVA) model which can handle speech and text as inputs. This ablation is trained using only distillation loss as described in the ablations here: https://huggingface.co/papers/2410.02678
Weights and Biases Run: https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/8i1dd47i?nw=nwuserheld
Citation
This is the distillation only model from https://huggingface.co/papers/2410.02678: BibTeX:
@misc{held2024diva,
author="Held, Will and Zhang, Yanzhe and Ryan, Michael and Shi, Weiyan and Li, Ella and Yang, Diyi",
title="Distilling an End-to-End Voice Assistant from Speech Recognition Data",
year="2024",
publisher="HuggingFace",
}
Table of Contents
- Model Card for DiVA Llama 3
- Citation
- Table of Contents
- Training Details
- Environmental Impact
- Technical Specifications [optional]
- Model Card Contact
Training Details
Training Data
This model was trained on the CommonVoice corpus.
Training Procedure
This model was trained for 7k gradient steps with a batch size of 512 Recordings and a linearly decaying learning rate from 5e-5 to zero, with a linear warmup of 70 steps.
Environmental Impact
- Hardware Type: V4-32 TPU
- Hours used: 8 Hours
- Cloud Provider: Google Cloud.
- Compute Region: US Central C
Hardware
This model was trained on at V4 TPU on Google Cloud.
Software
This model was trained with Levanter
Model Card Authors [optional]
Will Held