File size: 1,175 Bytes
ee18837 fd616a6 ee18837 fd616a6 ee18837 fd616a6 c8a78f1 fd616a6 51dfdb2 fd616a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
language: fr
license: mit
tags:
- bert
- language-model
- flaubert
- french
- flaubert-base
- uncased
- asr
- speech
- oral
- natural language understanding
- NLU
- spoken language understanding
- SLU
- understanding
---
# FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling
**FlauBERT-Oral** are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the [**FlauBERT software**](https://github.com/getalp/Flaubert) using the same parameters as the [flaubert-base-uncased](https://huggingface.co/flaubert/flaubert_base_uncased) model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).
## Available FlauBERT-Oral models
- `flaubert-oral-asr` : trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncased
- `flaubert-oral-asr_nb` : trained from scratch on ASR data, BPE tokenizer is also trained on the same corpus
- `flaubert-oral-mixed` : trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus
|