metadata
language: fr
license: mit
tags:
- bert
- language-model
- flaubert
- french
- flaubert-base
- uncased
- asr
- speech
- oral
- natural language understanding
- NLU
- spoken language understanding
- SLU
- understanding
FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling
FlauBERT-Oral are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the FlauBERT software using the same parameters as the flaubert-base-uncased model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).
Available FlauBERT-Oral models
flaubert-oral-asr
: trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncasedflaubert-oral-asr_nb
: trained from scratch on ASR data, BPE tokenizer is also trained on the same corpusflaubert-oral-mixed
: trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus