Training data

by BramVanroy - opened 1 day ago

1 day ago

Hello

Thanks for this model - it was clearly needed in the EU!

Do you have an exhaustive description of the data that you used, both for pretraining and instruction tuning? Since this is an EU project and the models are apache licensed, I'd be very hopeful to see the datasets released/transparently described!

Thanks again for the work!

phmartins

UTTER - Unified Transcription and Translation for Extended Reality org 1 day ago

•

edited 1 day ago

Hi Bram. Thank you!

We are going to release a technical report describing all the data, pre-training and post-training details soon.
And we're also planning to release the data once we release the final model (that we're starting to work on now).

phmartins changed discussion status to closed 1 day ago

BramVanroy

1 day ago

That sounds awesome! Transparency is so important and very much appreciated - thanks a lot!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment